5.3.1.2 Training Logic

(Ask a Question)

The training logic is implemented as a soft FPGA core that de-skews read data and center aligns the clock using IOD capabilities in the read direction. A known training pattern, generated by the QDR controller, is used for training.

The IOD PHY receives the read data and performs a serial to parallel conversion. In the following figure, the receive path for the QDR data width = 36 case is shown in detail. The IOD PHY collects the bytes for a total of 72 bits received per lane. The bytes are then passed to the respective SHIM block. The training logic shifts the pattern by full bit times in the SHIM if required. This is called data framing and it is performed iteratively until the received pattern matches the expected pattern. The training logic compares the bytes to the expected pattern and adjusts delay taps in the IOD PHY to center align the data (“Q”) around memory clock CQP. This is called Q alignment. This operation is performed sequentially for the entire 288-bit word (4 lanes x 72 bits) byte by byte (lane by lane). The following figure shows the receive path.

Important: QDR training is performed on receive path only. Following QDR vendor-specific PCB guidelines ensures proper write operations.

The QDR RAM uses the CQP and CQN clock pair to capture data in the PHY. IODs in the PHY only use CQP for sampling data. The Lane Controller adjusts delay taps to center CQP within the valid data eye. This tuning occurs during CQ alignment.

Training Pattern

Q alignment– Implemented to adjust the delay taps of the serial bits received. A pseudo random generator is used with a standard PRBS8 sequence having a period of 256 words.
- Math formula is x⁸+ x⁶+x⁵+x⁴+1.
- Seed value is 0xAA.
CQ Alignment– Used to center align the clock with respect to the received data. Training pattern is fixed to 0xA5.

Training FSM

The FSM is designed to iterate across lanes and across each IOD data pad. Data framing, Q Alignment, CQ Alignment operations are all performed using the same methodology. A sequence is written into memory and read back; received data is compared with the transmitted data. Based on this comparison, some actions are taken. What differentiates the three phases is the block where these tuning actions are performed.

Data framing– It is used to identify the beginning of the data frame. There is a data valid signal that is a delayed version of the read command and is used as a gate to sample the data. In Figure 5-3, it applies to Q0 (Lane0), Q9 (Lane 1), Q18 (Lane2), and Q27 (Lane4). The CQP clock is used for sampling the data, looking for a match between write and read sequences. All the IOD delay taps are tried as it searches for a match. If there is no match, the read sequence and/or the data valid signal are shifted by an integer number of bits using the shift register in the read SHIM layer. This process continues until a there is a match, and by definition, data valid is now framing the data.
Q alignment– It is used to find the delay tap setting for each bit that results in the widest data valid window across the data byte. Only the bits sampled on the falling edge are used to compare the written and read data.
CQ alignment– After the delays taps for all IODs are set in a lane during Q alignment, then CQ alignment is performed. This step fine tunes the delay taps of the CQP. As we only use CQP to capture the data, we only adjust the delay taps on the CQP IOD. Only the bits sampled on the rising edge are used to compare written and read data.

The algorithm pseudo code of the whole process (data framing, Q alignment, CQ alignment) is as follows:

for k=1:num_lanes
  for g=1:num_IODs_in_lane(9)
     for j=1:delay_taps(256)
        for i=1:byte_len(8)
           // 1. Write a test pattern to memory generated from the PRBS8 generator.
           // 2. Read it back and compare it with bits 7,5,3,1 (falling edge bits)
           //    in the byte to the corresponding bits of the test pattern.
           // 3. Generate the next pattern from the PRBS8 generator.
        END for // byte_len
     // 1. Compare the eight generated patterns.
     // 2. Increment IOD delay taps as the training logic searches for all 8 patterns
     //    to compare.
     // 3. Once found, it indicates the start of the valid window, and the delay 
     //    value (delay left) is recorded.
     // 4. Continue to increment the delay taps as the patterns match. When 7(*8 patterns)
     //    consecutive matches and there is no longer a match, save this second delay value
     //    (delay right) calculate mid value (left+right)/2 and exit loop.
     END for //delay_taps
    // 1. After trying all delay values (or exited the loop):
    // 2. Once a solution is found, the IOD delay tap is set to the midpoint of the
    //    valid window.
    // 3. If (a solution is not found on the first IOD of the lane), increase the offset
    //    (see data framing). Reload default delay value and set j=1 and retry to delay.
    // 4. If out of range and there is no solution. Increase fail number and try again
    //    from beginning. If number of failures is greater than 15, declare ERROR.
  END for //num_IODs_in_lane
 Perform CQ alignment
END for //num_lanes

The timing relationships between CQP, CQN, data (Q), and data valid (before training begins) is shown in the following figure.

Figure 5-4. Data Capture Before Training Begins

At first, the falling edge data alignment is done concurrently with the data_valid alignment. As a result, the data_valid properly frames the 8-bit burst coming from the QDR device and the CQN centers the falling edge data (f0, f1, f2, f3).

Important: CQN is not moved because it cannot be moved in the specified architecture; instead, Q is moved to align with CQN by performing write and read of 8 different patterns and selecting a match if and only if all patterns match. This is done on a per-Q basis, with the exception of data framing, which is only performed on Q<0>.

A high-level timing diagram of alignment is shown in the following figure.

Figure 5-5. Data Capture After Aligning Data Valid and Falling Edge Data to CQN

Next, CQP is aligned with the rising edge data by performing write and read of 8 different patterns and selecting a match if and only if all patterns match.

Figure 5-6. Data Capture After Moving CQP to Center Rising Edge Data