Training Logic

The training logic is implemented as a soft FPGA core that de-skews read data and center aligns the clock using IOD capabilities in the read direction. A known training pattern, generated by the QDR controller, is used for training.

The IOD PHY receives the read data and performs a serial to parallel conversion. In the following figure, the receive path for the QDR data width = 36 case is shown in detail. The IOD PHY collects the bytes for a total of 72 bits received per lane. The bytes are then passed to the respective SHIM block. The training logic shifts the pattern by full bit times in the SHIM if required. This is called data framing and it is performed iteratively until the received pattern matches the expected pattern. The training logic compares the bytes to the expected pattern and adjusts delay taps in the IOD PHY to center align the data (“Q”) around memory clock CQP. This is called Q alignment. This operation is performed sequentially for the entire 288-bit word (4 lanes x 72 bits) byte by byte (lane by lane). The following figure shows the receive path.
Figure 1. Receive Path
Important: QDR training is performed on receive path only. Following QDR vendor-specific PCB guidelines ensures proper write operations.

The QDR RAM uses the CQP and CQN clock pair to capture data in the PHY. IODs in the PHY only use CQP for sampling data. The Lane Controller adjusts delay taps to center CQP within the valid data eye. This tuning occurs during CQ alignment.

Training Pattern

Training FSM

The FSM is designed to iterate across lanes and across each IOD data pad. Data framing, Q Alignment, CQ Alignment operations are all performed using the same methodology. A sequence is written into memory and read back; received data is compared with the transmitted data. Based on this comparison, some actions are taken. What differentiates the three phases is the block where these tuning actions are performed.

The algorithm pseudo code of the whole process (data framing, Q alignment, CQ alignment) is as follows:
for k=1:num_lanes
  for g=1:num_IODs_in_lane(9)
     for j=1:delay_taps(256)
        for i=1:byte_len(8)
           // 1. Write a test pattern to memory generated from the PRBS8 generator.
           // 2. Read it back and compare it with bits 7,5,3,1 (falling edge bits)
           //    in the byte to the corresponding bits of the test pattern.
           // 3. Generate the next pattern from the PRBS8 generator.
        END for // byte_len
     // 1. Compare the eight generated patterns.
     // 2. Increment IOD delay taps as the training logic searches for all 8 patterns
     //    to compare.
     // 3. Once found, it indicates the start of the valid window, and the delay 
     //    value (delay left) is recorded.
     // 4. Continue to increment the delay taps as the patterns match. When 7(*8 patterns)
     //    consecutive matches and there is no longer a match, save this second delay value
     //    (delay right) calculate mid value (left+right)/2 and exit loop.
     END for //delay_taps
    // 1. After trying all delay values (or exited the loop):
    // 2. Once a solution is found, the IOD delay tap is set to the midpoint of the
    //    valid window.
    // 3. If (a solution is not found on the first IOD of the lane), increase the offset
    //    (see data framing). Reload default delay value and set j=1 and retry to delay.
    // 4. If out of range and there is no solution. Increase fail number and try again
    //    from beginning. If number of failures is greater than 15, declare ERROR.
  END for //num_IODs_in_lane
 Perform CQ alignment
END for //num_lanes

The timing relationships between CQP, CQN, data (Q), and data valid (before training begins) is shown in the following figure.

Figure 2. Data Capture before Training begins

At first, the falling edge data alignment is done concurrently with the data_valid alignment. As a result, the data_valid properly frames the 8-bit burst coming from the QDR device and the CQN centers the falling edge data (f0, f1, f2, f3).

Note: CQN is not moved because it cannot be moved in the specified architecture; instead, Q is moved to align with CQN by performing write and read of 8 different patterns and selecting a match if and only if all patterns match. This is done on a per-Q basis, with the exception of data framing, which is only performed on Q<0>.

A high-level timing diagram of alignment is shown in the following figure.

Figure 3. Data capture after aligning data valid and falling edge data to CQN

Next, CQP is aligned with the rising edge data by performing write and read of 8 different patterns and selecting a match if and only if all patterns match.

Figure 4. Data capture after moving CQP to center rising edge data