3.5.1.17.12 Error Correction Code (ECC) Library

The Error Correction Code (ECC) library offers API functions for exposing ECC signals from the hardware memory. To use the ECC library, include the header file:

#include "hls/ecc.hpp"

Accessing ECC Signals

ECC signals can be accessed through the API function read_ecc. read_ecc will take in a pointer to the array element as the argument and returns a data wrapper that contains 3 elements:
data
The data read from the address of the array element. When single-bit error is detected, the data output is automatically corrected.
sb_correct
True if a single-bit error was detected and corrected.
db_detect
True if a double-bit error was detected.
Important: ECC memory in Microchip devices do not automatically write-back corrected data when single-bit error is detected.
The following is an example of how the API is used.
#include <stdio.h>
#include "hls/ecc.hpp"
#define SIZE 100

using namespace hls;
 
#pragma HLS memory impl variable(x) ecc(true)
int x[SIZE];
 
int f(int c) {
#pragma HLS function top
    int sb_count = 0;
    int db_count = 0;
 
    // Initialize the memory
    for (int i = 0; i < SIZE; i++)
        x[i] = c;
 
    int sum = 0;
    for (int i = 0; i < SIZE; i++) {
        auto ecc_out = read_ecc(&x[i]);
 
        // Check for single-bit error (corrected)
        if(ecc_out.sb_correct) {
            // Write back to correct the contents of the RAM
            x[i] = ecc_out.data;
            sb_count++;
        }
        // Check for double-bit error (detected)
        if(ecc_out.db_detect)
            db_count++;
 
        sum += ecc_out.data;
    }
 
    printf("sb_count: %d\n", sb_count);
    printf("db_count: %d\n", db_count);
    return sum;
}

In the example, variable x has ECC enabled by setting ecc(true) in the memory pragma. In the top-level function f, the elements of x  are initialized using the argument c. After the initialization, x  is read element-by-element using read_ecc(&x[i]) instead of directly using x[i]. The call to read_ecc returns a data wrapper ecc_out that contains the data read from memory and the ECC signals. The ECC signals are used to increment the corresponding counters sb_count , db_count , and the read data is added to sum .

Note that when single-bit error was detected (sb_correct = true), the read data output is corrected but data in the RAM location is not updated. Thus, in the example, corrected data is manually written back to the RAM for single-bit errors.

ECC RAM Wrapper

The ECC library also offers a C++ wrapper to encapsulate some of the common functionality to handle errors using ECC_RAM. ECC_RAM  is a pure C++ implementation using read_ecc  to show case how the access to the low-level ECC signals can be abstracted and used seamlessly in the design.
ECC_RAM<data_type, depth, SB_WRITE_BACK, DB_OVERRIDE, DB_DEFAULT> ecc_ram
ECC_RAM uses template parameters to configure the memory and error handling:
Data type
the element type of the memory.
Depth
the number of elements in the memory.
SB_WRITE_BACK
if true, when a single-bit error is detected, the corrected value is immediately written-back to correct the corrupted data in the memory.
CAUTION: Enabling immediate write-back using SB_WRITE_BACK , can affect the performance since the load operation can invoke an immediate store to the same address. This should be taken into consideration if the latency is critical (e.g. in a loop pipeline).
DB_OVERRIDE
if true, instead of returning the corrupted data when double-bit error is detected, a default value is used.
DB_DEFAULT
a default value when double-bit error is detected.
The example below illustrates error handling using ECC_RAM.
#include <hls/ecc.hpp>
#include <stdio.h>

using namespace hls;

#define __REPORT_ECC__

#define N 1000
#pragma HLS memory impl variable(ecc_ram) ecc(true)
ECC_RAM<int,  // data type
        N,    // depth
        true, // SB_WRITE_BACK
        true, // DB_OVERRIDE
        -1    // DB_DEFAULT
        >
    ecc_ram;

// ----- Top function: Read i, j and write to k
int f(int i, int j, int k, int val) {
#pragma HLS function top

    // ----- Reading and handling errors implicitly
    int d_i = ecc_ram[i];

    // ----- Reading and handling errors explicitly
    int d_j = 0;
    if (!ecc_ram.read(j, d_j))
      d_j = -2;

    int sum = d_i + d_j;

    // ----- Writing
    ecc_ram[k] = val;

    // ----- Reporting
    auto sb_count = ecc_ram.sb_count();

    // ----- Scrubbing after a certain number of SB errors
    if (ecc_ram.sb_count() > N / 2)
        ecc_ram.scrub();

    return d_i + d_j;
}

In the example, ecc_ram uses ECC_RAM to instantiate an int array with 1000 elements.

ECC_RAM  can be accessed similar to a C++ array using [] , however the implementation uses read_ecc  to access data and the ECC signals, and handle the errors based on the template configuration. With the configuration in the example: 

  • ecc_ram[i] will read the data and implicitly handle errors:
    • If a single-bit error is detected, write back the corrected data to the RAM and return the corrected data
    • If a double-bit error is detected, discard the read data and return the DB_DEFAULT value -1 .
  • ecc_ram.read(j, d_j) will read the data at jand and return the RAM data in d_j (correct or erroneous) to the caller:
    • If a single-bit error is detected, write back the corrected data to the RAM and set d_j  to the read data
    • If a double-bit error is detected, set d_j  to the erroneous read data.
    • Return true  if the data is correct, false  if a double-bit error is detected.

ECC_RAM  has internal counter that counts the number single-bit and double-bit errors for any read operation. The counters can be accessed using ecc_ram.sb_count()  and ecc_ram.db_count(). The counters can be reset using ecc_ram.reset_counters().

ecc_ram.scrub() will scrub the memory by reading element-by-element and write back them back. This can be useful when there are many errors to refresh the entries with single-bit errors and avoid further corruption.

ECC_RAM  can report the error handling process to the standard output when __REPORT_ECC__ is defined.

-- DB error detected at 0       Overriding with default value -1
- SB error corrected at 2       Writing back corrected value
Warning: ECC_RAM is a wrapper around the the array that represents the actual memory. In this release, it is not possible to apply memory optimizations on the underlying data (e.g. partitioning).
Here is a summary of all the API functions:
Class Method Description
ECC_RAM<data_type, depth, SB_WRITE_BACK, DB_OVERRIDE, DB_DEFAULT>() Create a new ECC RAM with the specified parameters.
operator[i] Read/write data at index i and handle errors based on the configuration.
bool read(i, d_i) Read data at index i and save the read data from memory into d_i. Return true  if the data is correct, false  otherwise.
int sb_count() Return the number of single-bit error detected and corrected.
int db_count() Return the number of double-bit error detected.
void reset_counters() Reset both single-bit and double-bit counters to 0.
void scrub() Read the memory element-by-element and write back them back. Note that scrub()automatically calls reset_counters() to reset the error counters.