3.5.1.19.13 Error Correction Code (ECC) Library
(Ask a Question)The Error Correction Code (ECC) library offers API functions for exposing ECC signals from the hardware memory. To use the ECC library, include the header file:
#include "hls/ecc.hpp"
Accessing ECC Signals
ECC signals can be accessed through the API functionread_ecc
. read_ecc
will take in a pointer to the array element or the FIFO as the argument and returns a data wrapper that contains 3 elements:data
- The data read from the address of the array element. When single-bit error is detected, the data output is automatically corrected.
sb_correct
- True if a single-bit error was detected and corrected.
db_detect
- True if a double-bit error was detected.
In RTL simulation waveform, both the db_dectect
and sb_correct
flags are asserted when a multi-bit error occurs, and only the sb_correct
flag is asserted for single-bit error (refer to the RTG4 FPGA Fabric User Guide and the PolarFire Family Fabric User Guide for more details). To simplify the C++ design, SmartHLS™ automatically handled this pattern. User only needs to check for sb_correct=1
for single-bit errors and db_dectect=1
for multi-bit errors
#include <stdio.h> #include "hls/ecc.hpp" #define SIZE 100 using namespace hls; #pragma HLS memory impl variable(x) ecc(true) int x[SIZE]; int f(int c) { #pragma HLS function top int sb_count = 0; int db_count = 0; // Initialize the memory for (int i = 0; i < SIZE; i++) x[i] = c; int sum = 0; for (int i = 0; i < SIZE; i++) { auto ecc_out = read_ecc(&x[i]); // Check for single-bit error (corrected) if(ecc_out.sb_correct) { // Write back to correct the contents of the RAM x[i] = ecc_out.data; sb_count++; } // Check for double-bit error (detected) if(ecc_out.db_detect) db_count++; sum += ecc_out.data; } printf("sb_count: %d\n", sb_count); printf("db_count: %d\n", db_count); return sum; }
In the example, variable x
has ECC enabled by setting ecc(true)
in the memory pragma. In the top-level function f
, the elements of x
are initialized using the argument c
. After the initialization, x
is read element-by-element using read_ecc(&x[i])
instead of directly using x[i]
. The call to read_ecc
returns a data wrapper ecc_out
that contains the data read from memory and the ECC signals. The ECC signals are used to increment the corresponding counters sb_count
, db_count
, and the read data is added to sum
.
Note that when single-bit error was detected (sb_correct = true
), the read data output is corrected but data in the RAM location is not updated. Thus, in the example, corrected data is manually written back to the RAM for single-bit errors.
Inject ECC Error for Error Simulation
inject_ecc_error
for simulating ECC error.
inject_ecc_error
takes in 2 parameters:address
- Pointer to the array element
mask
-
- Masking bit that will be applied to read data output at the given address
- Read data bit is
flipped where masking bit is 1
- single-bit
error if
mask
has only one non-zero bit - multi-bit
error if
mask
has more than one non-zero bits
- single-bit
error if
address
and mask
must be compile-time
constants. Input of address
and mask
as variables
is not supported. For example, inject_ecc_error(&x[i], mask);
is not supported because i
and mask
are
variables.The generated Verilog file that contains all the error injection
tasks is located in
hls_output/simulation/generated_include_file.v
.
The following example shows how ECC error simulation works.
#include <stdio.h> #include "hls/ecc.hpp" #define SIZE 100 using namespace hls; int fct(int &sb_count, int &db_count) { #pragma HLS function top #pragma HLS memory impl variable(x) ecc(true) int x[SIZE]; for (int i = 0; i < SIZE; i++) { x[i] = i; } inject_ecc_error(&x[3], /*mask 'b1*/ 1); inject_ecc_error(&x[4], /*mask 'b11*/ 3); inject_ecc_error(&x[99], /*mask 'b10001*/ 17); int sum = 0; for (int i = 0; i < SIZE; i++) { auto ecc_info = read_ecc(&x[i]); if( ecc_info.sb_correct){ sb_count++; printf("sb_correct at x[%d], data = %d\n", i, ecc_info.data); } else if(ecc_data_x.db_detect){ db_count++; printf("db_detect at x[%d], data = %d\n", i, ecc_data_x.data); } sum += ecc_data_x.data; } return sum; }
In this example, x
is a local memory with ECC enabled and
initialized with incremental data. In the top-level function fct
,
inject_ecc_error
is called 3 times to set error masks for
x[3]
, x[4]
and x[99]
.
Single-bit or double-bit errors are injected to the given address based on the
masking value (see code comments). In the next for
loop,
x
is read element-by-element using
read_ecc(&x[i])
. A message would be printed when a
single-bit error or double-bit error is detected and the read data is added to
sum.
inject_ecc_error
.
# sb_correct at x[ 3], data = 3
# db_detect at x[ 4], data = 7
# db_detect at x[ 99], data = 114
SmartHLS provides memory optimization for structs such as packing and partition by struct fields. These optimizations may be tricky to handle with error injections.
struct
ST
is automatically partitioned by struct fields (see Access-Based Memory Partitioning fore more
details). inject_ecc_error
is applied on each struct field to match
with the RTL behavior.
On
the other hand, when enabling bit-packing on ECC structs, all struct fields will be
merged into one RAM block. It is recommended to inject error per struct to match the
RTL behavior. In the example below, mask 0x300000003ULL
is applied
on the entire struct.
Limitations
- On the software side, error injection mask will only be applied to read data when accessing the RAM with
read_ecc
. Access withoperator[]
will return the original value without any error. To ensure that the SW/HW Co-Simulation behaves correctly, it is recommend to use ECC_RAM or always access the RAM withread_ecc
to avoid mismatch between software and hardware results. - Currently, error injection calls are only supported in hardware top-level function and all of its descendant functions. Error injection calls in the software testbench will be ignored. See Specifying the Top-level Function for details on software testbench and top-level function.
- ECC error injection calls may have effect on number of simulation cycles, depending on the memory access pattern. All
inject_ecc_error
are treated as a write operation. address
andmask
must be compile-time constants and the maximum supported size for error injection mask is 64 bits. Input ofaddress
andmask
as variables is not supported. For example, consider the following:inject_ecc_error(&x[i], mask);
. In this case,i
andmask
are variables and are not supported.- If memory is partitioned into individual elements (registers), ECC error injection logic will be optimized away. This may result in mismatch between software and hardware read data.
ECC RAM Wrapper
The ECC library also offers a C++ wrapper to encapsulate some of the common functionality to handle errors usingECC_RAM
. ECC_RAM
is a pure C++ implementation using read_ecc
to show case how the access to the low-level ECC signals can be abstracted and used seamlessly in the design.ECC_RAM<data_type, depth, SB_WRITE_BACK, DB_OVERRIDE, DB_DEFAULT> ecc_ram
ECC_RAM
uses template parameters to configure the memory and error handling:Data type
- the element type of the memory.
Depth
- the number of elements in the memory.
SB_WRITE_BACK
- if true, when a single-bit error is detected, the corrected value is immediately written-back to correct the corrupted data in the memory.CAUTION: Enabling immediate write-back using
SB_WRITE_BACK
, can affect the performance since the load operation can invoke an immediate store to the same address. This should be taken into consideration if the latency is critical (e.g. in a loop pipeline). DB_OVERRIDE
- if true, instead of returning the corrupted data when double-bit error is detected, a default value is used.
DB_DEFAULT
- a default value when double-bit error is detected.
ECC_RAM
.#include <hls/ecc.hpp> #include <stdio.h> using namespace hls; #define __REPORT_ECC__ #define N 1000 #pragma HLS memory impl variable(ecc_ram) ecc(true) ECC_RAM<int, // data type N, // depth true, // SB_WRITE_BACK true, // DB_OVERRIDE -1 // DB_DEFAULT > ecc_ram; // ----- Top function: Read i, j and write to k int f(int i, int j, int k, int val) { #pragma HLS function top // ----- Reading and handling errors implicitly int d_i = ecc_ram[i]; // ----- Reading and handling errors explicitly int d_j = 0; if (!ecc_ram.read(j, d_j)) d_j = -2; int sum = d_i + d_j; // ----- Writing ecc_ram[k] = val; // ----- Reporting auto sb_count = ecc_ram.sb_count(); // ----- Scrubbing after a certain number of SB errors if (ecc_ram.sb_count() > N / 2) ecc_ram.scrub(); return d_i + d_j; }
In the example, ecc_ram
uses ECC_RAM
to instantiate an int
array with 1000 elements.
ECC_RAM
can be accessed similar to a C++ array using []
, however the implementation uses read_ecc
to access data and the ECC signals, and handle the errors based on the template configuration. With the configuration in the example:
ecc_ram[i]
will read the data and implicitly handle errors:- If a single-bit error is detected, write back the corrected data to the RAM and return the corrected data
- If a double-bit error is detected, discard the read data and return the
DB_DEFAULT
value-1
.
ecc_ram.read(j, d_j)
will read the data atj
and and return the RAM data ind_j
(correct or erroneous) to the caller:- If a single-bit error is detected, write back the corrected data to the RAM and set
d_j
to the read data - If a double-bit error is detected, set
d_j
to the erroneous read data. - Return
true
if the data is correct,false
if a double-bit error is detected.
- If a single-bit error is detected, write back the corrected data to the RAM and set
ECC_RAM
has internal counter that counts the number single-bit and double-bit errors for any read operation. The counters can be accessed using ecc_ram.sb_count()
and ecc_ram.db_count()
. The counters can be reset using ecc_ram.reset_counters()
.
ecc_ram.scrub()
will scrub the memory by reading element-by-element and write back them back. This can be useful when there are many errors to refresh the entries with single-bit errors and avoid further corruption.
ECC_RAM
can report the error handling process to the standard output when __REPORT_ECC__
is defined.
-- DB error detected at 0 Overriding with default value -1
- SB error corrected at 2 Writing back corrected value
ECC_RAM
is a wrapper around the the array that represents the actual memory. In this release, it is not possible to apply memory optimizations on the underlying data (e.g. partitioning).Class Method | Description |
---|---|
ECC_RAM<data_type, depth, SB_WRITE_BACK, DB_OVERRIDE, DB_DEFAULT>() | Create a new ECC RAM with the specified parameters. |
operator[i] | Read/write data at index i and handle errors based on the configuration. |
bool read(i, d_i) | Read data at index i and save the read data from memory into d_i . Return true if the data is correct, false otherwise. |
int sb_count() | Return the number of single-bit error detected and corrected. |
int db_count() | Return the number of double-bit error detected. |
void reset_counters() | Reset both single-bit and double-bit counters to 0. |
void scrub() | Read the memory element-by-element and write back them back. Note that scrub() automatically calls reset_counters() to reset the error counters. |
void inject_error(addr, mask) | Inject error at the given address based on the masking bits. |