Top-level Driver Options in Pointer Arguments' AXI4 Target Interface Pragma

For a pointer argument, the top-level driver functions will use 'memcpy' or DMA transfer functions based on the dma(<true|false>) option of the argument's interface pragma. If the option is not specified, the top-level driver functions will use 'memcpy' transfer by default.

SmartHLS determines the input and/or output direction for a pointer argument based on the read and write accesses of the pointer argument in the C++ implementation. That is,

  • If the pointer argument is only being read in the C++ code, then the direction is input and the top-level driver functions will only transfer the data into the HLS module's corresponding buffer before starting the HLS module;

  • If the pointer argument is write-only, then its direction is output and the top-level driver functions will only transfer the data back from the HLS module's corresponding buffer after the HLS module finishes execution;

  • If the pointer argument is both read and written by the C++ code, then its direction is inout and data transfer happens before and after the HLS module execution.

However, there can be cases where the direction analysis is not accurate and requires user intervention to specify a requires_copy_in option in the interface pragma,

Case 1

The C++ implementation only writes to a pointer argument, but doesn't write to all elements ("partial update to a write-only memory").

  • For example, say we have an int array[100] argument configured to use AXI4 target interface, and the C++ implementation never reads from the array, and only writes to one element of the array (e.g., array[k], imagine k is another argument that changes between invocations). SmartHLS will create a depth-100 buffer for the argument.
  • Since the argument is write-only, by default SmartHLS considers the argument as an "output" and the top-level drivers will not initialize the buffer by transferring data from processor memory to the buffer before the HLS module starts. As only one element of the array is updated, the rest of the 99 elements are left uninitialized in the buffer. When HLS module finishes execution, the top-level drivers transfer all data in buffer back to processor memory for the "output" argument.
  • Now the problem is the 99 uninitialized elements are also copied and will overwrite the correct-yet-supposedly-unchanged elements in the processor memory.
  • To avoid this issue, it is required to add a requires_copy_in(true) option to the pointer argument's AXI4 target interface pragma. With this option specified, the top-level drivers will make sure to initialize the buffer by copying data from processor memory, before starting the HLS module's execution. Here is an example pragma:
    #pragma HLS interface argument(array) type(axi_target) requires_copy_in(true)
  • We will improve the direction analysis to detect such case in a future release.

Case 2

The C++ implementation reads and writes to a pointer argument, but only reads the elements that have first been written to.

  • For example, let's consider this simple matrix multiple function:
    void matrix_multiply(int in_A[M][K], int in_B[K][N], int out_C[M][N]) {
        for (int m = 0; m < M; m++) {
            for (int n = 0; n < N; n++) {
                out_C[m][n] = 0;
                for (int k = 0; k < K; k++)
                    out_C[m][n] = out_C[m][n] + in_A[m][k] * in_B[k][n];
            }
        }
    }
  • The out_C argument is both read and written by the algorithm as part of the accumulate operation. By default SmartHLS will consider out_C as "inout" based on the read and write accesses. The top-level driver functions will therefore transfer data to/from the corresponding buffer before and after HLS module execution.
  • However, we can observe that all writes to out_C[m][n] happen after out_C[m][n] have been first written with an initial value of 0. This means the initial copy-in transfer to the buffer is not necessary.
  • In this case, user can explicitly add the requires_copy_in(false) option to the interface pragma, such that the top-level driver functions will skip the copy-in transfer prior to HLS module's execution:
    #pragma HLS interface argument(out_C) type(axi_target) requires_copy_in(false)
Important: Some AXI4 Target arguments can be accessed concurrently while the accelerator is running. For this feature to work the argument must be single-element, passed by reference or by pointer. See examples below on how to do this. If both accelerator and CPU happen to write the same argument at the same time, then the accelerator has higher priority than the CPU.