4 Training Logic

(Ask a Question)

The training logic manages the training requests between the integrated PHY and DDR controller modules, and performs the following operations.

Clock training
Write leveling
Read leveling
I/O calibration

The CTRLR_READY signal is asserted to indicate the completion of initialization and training. This signal can be monitored from the fabric.

Clock Training

HS_IO_CLK to SYS_CLK training—aligns HS_IO_CLK to the DDR controller clock. When it is aligned, the data can be transferred to or from the user logic.
CMD/ADDR to REF_CLK training—aligns the rising edge of REF_CLK to the center of the address and command buses of the DDR memory. When the rising edge is aligned, DDR commands can be written to the SDRAM.
CMD/ADDR to REF_CLK training
- Offset of the REF_CLK to CMD/ADDR buses relies on internal silicon delays and can be affected by board signal integrity.
- The user can include an additional phase offset of the signals through the memory configurator on the general tab using the CK/CA additive offset field. Each offset increment equates to 45 degrees.
- The default setting of 0 was derived from test results using a DIMM based validation platform.
- The user may need to adjust this parameter based on the specific board requirements. It is recommended to derive the accurate values of skew and termination based on simulations.

Write Leveling

Write leveling is a training mode used during DRAM initialization. The write leveling process identifies the delay when the write DQS rising edge aligns with the rising edge of REF_CLK. By identifying this delay, the system can accurately align the write DQS within REF_CLK. When it is aligned, data can be written to the SDRAM. This alignment is at the memory chip, not at the I/O Lane.

The fly-by topology introduced in DDR3 improves signal integrity, but it creates variable skew between the CLK and DQS in every DRAM device. Long traces on DIMM can create skew greater than one MEM_CLK cycle.

However, the JEDEC DDR TDQSS specification requires DQS and CLK to align ±0.25 MEM_CLK. Write leveling compensates for this skew. It is the first in a sequence of training steps. The memory controller has a programmable DQS delay. The DRAM provides feedback to indicate if the DQS leads or lags the CLK.

As shown in the following figure, the training logic sends out widely spaced DQS pulses. The DRAM uses CLK as D input and DQS as CLK to the flip-flop. The Q output of the flip-flop is fed back to the prime DQ (the DQ bit on which feedback is provided). The prime DQ could be different for different vendors. The objective is to detect a 0-to-1 transition on the CLK with a DQS rising edge. This is done by moving the DQS in small steps until the sampled CLK changes from 0-to-1. When the transition is detected, REF_CLK is aligned with DQS.

Read Leveling

DQS gate training—The training logic has a DQS gate signal for capturing the correct DQS strobe during read operations and for removing the DQS glitches. The purpose of gate training is to determine the optimum delay that can be applied to the DQS gate for it to function properly.
As shown in the following figure, when the bus is idle, DQ/DQS are at VDD/2. The output of the DQS receiver is undefined. Read DQS is internally used to clock FIFO read data. The write pointer for read data must correspond to every incoming piece of data. This means incoming DQS must be qualified with DQS GATE cover pulse, which goes high during pre-amble and goes low during post-amble. For board design guidelines for DQ and DQS pins, see PolarFire Board Design Recommendations and PolarFire SoC Board Design Recommendations.
The memory controller generates the read commands required for DQS gate training. The training logic places the DQS gate in the middle of the burst. It detects a 0-to-1 transition of DQS at the rising edge of the DQS gate. After the transition is detected, the DQS gate is advanced by 90° (of MEM_CLK) to place the rising edge in the middle of the DQS high pulse. Then, the rising edge is walked back one MEM_CLK at a time. The rising edge detects a value of 1 until it hits pre-amble.
When the rising edge of the DQS gate detects a value of 0 for the first time during this walk back process, it is in the pre-amble area of the DQS burst. Only the rising edge of the DQS gate needs to be trained. The training logic runs internal timers to place the falling edge of the gate in the post-amble area.
Figure 4-2. DQS Gate Training
Align read DQ bits—aligns the read DQ edges for a lane to increase the size of the data window.
Align read DQS to DQ—aligns the DQS strobe to the center of the data window to increase its size. DQ/DQS centering is a two-step process. First, all DQ bits are aligned to maximize the data valid window. Then, the DQS strobe is placed at the center. Because of variables such as package routing, parasitics (controller and device), and board layout, the performance of all DQ bits is not identical. Each DQ bit in a byte lane are impacted differently. Consequently, there is always a best case and worst case DQ bit within a byte lane. Placement of DQS at the center of data valid window is done using known patterns. When the read data matches with the data written to a location, the DQS is moved both to the left and right until it fails. After both the left and right margins are identified, the mean of the left and right margins [(left + right)/2] is computed, and the DQS is placed at the newly computed mean value. The following figures show DQ/DQS before and after centering.
Figure 4-3. DQ/DQS Before Centering
Figure 4-4. DQ/DQS After Centering

I/O Calibration

The Fabric DDR subsystem performs DDR memory I/O calibration after device power-up. For more information about I/O calibration, see PolarFire Family Power-Up and Resets User Guide .