With the increasing cost of energy and awareness of environmental considerations, power saving is becoming more and more critical. Power saving has always been critical for mobile devices, but now data-center devices such as servers, storage and networking equipment are being required to support power management implementations.

In PCI Express®, Active State Power Management (ASPM) has been incorporated as a method to better manage power consumption of all system components. ASPM allows individual serial Links in a PCI Express fabric to have power incrementally reduced as a Link becomes less active. ASPM is defined in the PCI Express Base Specification.

Active State Power Management

The PCI Express Base Specification defines two levels of ASPM, which are designed to provide options for trading off increased power conservation with rapid recovery to the L0 state.

L0 Standby (L0s): This state is required by all PCI Express devices and applies to a single direction on the link. The latency to return to L0 from L0s is specified to be very short.

When entering L0s, the device moving into this power saving state will send an Electrical Idle Ordered Set (EIOS) to the receiving device, then turn off power to its transmitter. When returning from L0s to L0, the device must first generate a specific number of small Ordered Sets known as Fast Training Sequences (FTS). The specific number to be repeated is defined by the receiving device and broadcast during Training Sequences at link up time.

The more FTS that are transmitted, the easier it is to obtain receiver lock on the transmitted signal. However, the object of L0s is to regain receiver lock and be able to receive traffic as quickly as possible, so the receiving device will select the smallest possible number of FTS that will ensure clock recovery based on its specific design.

L1 ASPM: This state is optional and can be entered to achieve a greater degree of power conservation. In this state, both directions of the link are placed into the L1 state. Return to L0 requires both devices to go through the Link Recovery process. This results in a greater latency to return to L0, so this power state would typically be used when activity on the link is not expected for some significant time period.

To enter the L1 state, the downstream device must first request permission from the upstream device to enter the deeper power conservation state. Upon acknowledgement, both devices will turn off their transmitters and enter electrical idle. Returning from L1 requires, that both devices must now go through the Link Recovery Process. The Link Recovery Process uses standard TS1 and TS2 ordered sets as opposed to the smaller Fast Training Sequences used by L0s.

Using Protocol Analyzers with ASPM

When a protocol analyzer is used to monitor a PCI Express bus, the protocol analyzer cannot actively participate in the link. As the Root Complex and End Point negotiate the Link and broadcast their individual receiver lock requirements (including the number of FTS), the analyzer must remain passive and simply record this traffic. When devices go into power saving mode and turn their transmitters off, this entry into electrical idle will cause the analyzer to lose receiver lock with the devices. Since the analyzer cannot broadcast its own requirement for FTS, the analyzer must be capable of regaining lock more quickly than the receiving devices, or data may be lost.

Obviously, the more FTS that are transmitted on the bus, the greater the ability of any device to attain receiver lock. This is true for the receiving device as well as the protocol analyzer. However, to meet the objective of bringing the PCIe® link back to L0 as quickly as possible, the receiving device uses the smallest number of FTS possible, based on its own design. The smaller the number of FTS that actually occur on the bus, the greater the chance that the protocol analyzer will lose traffic prior to regaining its own signal lock.

The ability to very rapidly regain signal lock is therefore critical to the design of PCI Express protocol analyzers. This ability will vary with designs from different manufacturers, and is therefore an important consideration for any engineer considering purchase of a PCI Express analyzer.

The following information details studies performed by LeCroy to identify and detail the number of FTS required to attain 100% receiver lock by both the LeCroy Summit™ T2-16 Protocol Analyzer and the Agilent E2960B Protocol Analyzer with the optional N5322A ASPM Module.

Test Methodology

Goal

The goal of this testing was to measure the lock time of the PCI Express analyzers at both PCI Express 1.0 (2.5 GT/s) and PCI Express 2.0 (5.0 GT/s) data rates. This lock time determines the minimum number of FTS ordered sets that the devices under test (DUTs) must transmit in order to provide a clean capture of the PCI Express protocol traffic. Since the minimum FTS requirement may vary from device to device, the measure of the analyzer’s ability to regain lock with a very small number of FTS will provide an indication of the analyzer’s ability to cleanly capture traffic in typical systems.

Overview

To determine the lock time, a test environment was designed which created controlled data traffic that a system would generate while its L0s active state power management (ASPM) mode is enabled.

To properly measure the lock time, this traffic needed to be repeatable, unique, and periodic in nature to best enable the ability to check for errors. While simply recording a system with ASPM enabled would provide some means of measuring performance, it is not as robust as a test environment where the exact details of the traffic are known prior to recovery of the data

Data Traffic

To best approximate PCI Express L0s data traffic, the following stream of data was sent through the analyzer in an infinite loop:

  • 2 μs of electrical Idle time
  • “N” number of Fast Training Sequences (FTS ordered sets)
  • 1 SKIP ordered set
  • 1 Vendor DLLP with an incremented value in the vendor data field on each repeat
  • 2 μs of logical idle traffic (D0.0)
  • 1 or 2 Electrical Idle ordered sets (1 for PCIe 1.0, 2 for PCIe 2.0 as the PCIe spec requires)

This unique packet (shown in Figure 1 below) is a Vendor DLLP with an incrementing counter value inserted in the Vendor Data field. The uniqueness of this data packet is required to determine if ALL of the electrical idle exit conditions were properly recovered. Without this unique data, it is possible that the analyzer could miss an entire electrical idle exit event and not be detected.

Figure 1:

L0s traffic as used for this testing

Hardware Setup

To generate the above traffic, a special build of the Summit Z2-16 Exerciser BusEngine™ was compiled. This BusEngine provided programmability in the electrical idle duration, number of FTS ordered sets, and link speed. The Summit Z2-16 traffic was sent to the Host Emulation platform. Both the LeCroy Summit T2-16 Analyzer and Agilent E2960B Analyzer used their active slot interposers to recover this traffic by being inserted into the PCIe slot on the top of the Host Emulator.

Downstream traffic was looped back after the active slot interposer, but this upstream traffic was not recorded for the purpose of this testing.

The Agilent E2960B System used the Agilent N5322A ASPM module. This module is an add-on option required for ASPM testing. The Agilent E2960B Analyzer was running software version 6.13.

The LeCroy Summit Z2-16 Analyzer includes the ability to manage ASPM as a standard feature, so the standard hardware was used and was running software version 5.62.

Figure 2:

Hardware setup for LeCroy Summit T2-16 Analyzer

Figure 3:

Hardware setup for Agilent E2960B with N5322A option

Identical tests were performed on both products with the following parameters set:

  • Link width: x8
  • Scrambling: Enabled
  • Auto Speed Detection: Disabled (Link speed was forced on both analyzers)
  • Analyzer was calibrated at PCIe 2.0 x8 with L0 traffic prior to the test.
  • Electrical Idle duration: 2 μs
  • Logical idle duration: 2 μs

The following settings were varied during the test:

  • Link Speed: PCIe 1.0 and PCIe 2.0
  • Number of FTS: Varied from 4 to 64
  • RefClk Setting: External and Internal
  • SSC modulation: Enabled or Disabled

Data Recording and Analysis

For each test the analyzer recorded a snapshot of the L0s traffic with the appropriate analyzer setting to match the link state. The recording buffer was increased to ensure that at least 1000 L0s electrical idle transitions were captured in the recording. In both the Agilent and the LeCroy cases, the traffic was post-processed using software to determine the failure rate for each test. A failure is determined by not properly recording the Vendor DLLP with the appropriate vendor data. This test does not verify the correctness of the FTS ordered sets or the logical idle traffic. Instead it assumes that the Link was properly locked if the analyzer was able to properly perform symbol alignment, lane to lane deskew, and proper recovery of the Vendor DLLP.

Agilent E2960B Analyzer

Each recording was exported to a .csv file, which was parsed using a perl script.

Prior to saving the .csv file, some display options were applied to the trace to perform the following: Hide the upstream traffic, display only the Vendor DLLP’s, and hide unneeded fields in the trace display. This was performed simply to reduce the size of the .csv files as well as the complexity of the perl script.

The perl script counted the total number of DLLP’s recovered. Based on the timestamp between the DLLP’s and the vendor value, it determines if 1 or more vendor DLLPs were not properly recorded. The failure rate was calculated by dividing the total number of missing or corrupted packets by the total number of recovered packets.

LeCroy Summit Z2-16 Analyzer

Each recording was processed with the LeCroy verification script engine. A specific verification script was created to determine how many Vendor DLLP’s were recorded and how many were missed or recorded improperly. The algorithms used in the verification script were essentially identical to the algorithms in the Agilent .csv processing perl script.

Testing was automated through the use of the PCI Express compliance test application to reduce the likelihood of user error. Tests were performed 4 times to determine how repeatable the results were, and the results were averaged for the final score.

Results and Conclusions

Compare <1% Failure RateCompare 0% Failure Rate