5.3.1.2 Training Logic
(Ask a Question)The training logic is implemented as a soft FPGA core that de-skews read data and center aligns the clock using IOD capabilities in the read direction. A known training pattern, generated by the QDR controller, is used for training.
The QDR RAM uses the CQP and CQN clock pair to capture data in the PHY. IODs in the PHY only use CQP for sampling data. The Lane Controller adjusts delay taps to center CQP within the valid data eye. This tuning occurs during CQ alignment.
Training Pattern
- Q alignment– Implemented to adjust the delay taps of the serial bits received. A
pseudo random generator is used with a standard PRBS8 sequence having a period of 256
words.
- Math formula is x8+ x6+x5+x4+1.
- Seed value is
0xAA.
- CQ Alignment– Used to center align
the clock with respect to the received data. Training pattern is fixed to
0xA5.
Training FSM
The FSM is designed to iterate across lanes and across each IOD data pad. Data framing, Q Alignment, CQ Alignment operations are all performed using the same methodology. A sequence is written into memory and read back; received data is compared with the transmitted data. Based on this comparison, some actions are taken. What differentiates the three phases is the block where these tuning actions are performed.
- Data framing– It is used to identify the beginning of the data frame. There is a data valid signal that is a delayed version of the read command and is used as a gate to sample the data. In Figure 5-3, it applies to Q0 (Lane0), Q9 (Lane 1), Q18 (Lane2), and Q27 (Lane4). The CQP clock is used for sampling the data, looking for a match between write and read sequences. All the IOD delay taps are tried as it searches for a match. If there is no match, the read sequence and/or the data valid signal are shifted by an integer number of bits using the shift register in the read SHIM layer. This process continues until a there is a match, and by definition, data valid is now framing the data.
- Q alignment– It is used to find the delay tap setting for each bit that results in the widest data valid window across the data byte. Only the bits sampled on the falling edge are used to compare the written and read data.
- CQ alignment– After the delays taps for all IODs are set in a lane during Q alignment, then CQ alignment is performed. This step fine tunes the delay taps of the CQP. As we only use CQP to capture the data, we only adjust the delay taps on the CQP IOD. Only the bits sampled on the rising edge are used to compare written and read data.
for k=1:num_lanes for g=1:num_IODs_in_lane(9) for j=1:delay_taps(256) for i=1:byte_len(8) // 1. Write a test pattern to memory generated from the PRBS8 generator. // 2. Read it back and compare it with bits 7,5,3,1 (falling edge bits) // in the byte to the corresponding bits of the test pattern. // 3. Generate the next pattern from the PRBS8 generator. END for // byte_len // 1. Compare the eight generated patterns. // 2. Increment IOD delay taps as the training logic searches for all 8 patterns // to compare. // 3. Once found, it indicates the start of the valid window, and the delay // value (delay left) is recorded. // 4. Continue to increment the delay taps as the patterns match. When 7(*8 patterns) // consecutive matches and there is no longer a match, save this second delay value // (delay right) calculate mid value (left+right)/2 and exit loop. END for //delay_taps // 1. After trying all delay values (or exited the loop): // 2. Once a solution is found, the IOD delay tap is set to the midpoint of the // valid window. // 3. If (a solution is not found on the first IOD of the lane), increase the offset // (see data framing). Reload default delay value and set j=1 and retry to delay. // 4. If out of range and there is no solution. Increase fail number and try again // from beginning. If number of failures is greater than 15, declare ERROR. END for //num_IODs_in_lane Perform CQ alignment END for //num_lanes
The timing relationships between CQP, CQN, data (Q), and data valid (before training begins) is shown in the following figure.
At first, the falling edge data alignment is done concurrently with the data_valid alignment. As a result, the data_valid properly frames the 8-bit burst coming from the QDR device and the CQN centers the falling edge data (f0, f1, f2, f3).
A high-level timing diagram of alignment is shown in the following figure.
Next, CQP is aligned with the rising edge data by performing write and read of 8 different patterns and selecting a match if and only if all patterns match.
