AXI4 Initiator Interface

When AXI4 initiator is used for a pointer argument (not available for global variable), the HLS module will include an AXI4 initiator interface, with the associated ports named as axi4initiator_*. The "memory" referenced by the pointer argument is considered external to the HLS module. All memory accesses to the pointer argument will be translated into read or write AXI4 transactions through the AXI4 initiator interface. The pragma below specifies the AXI4 initiator interface type for a pointer argument (including array, struct, class types),

#pragma HLS interface argument(<ARGUMENT_NAME>) type(axi_initiator) \
                      ptr_addr_interface(<simple|axi_target>) \
                      num_elements(<NUM_ARRAY_ELEMENTS>) \
                      max_burst_len(<AXI4I_MAX_BURST_LENGTH>) \
                      max_outstanding_reads(<AXI4I_MAX_OUTSTANDING_READS>) \
                      max_outstanding_writes(<AXI4I_MAX_OUTSTANDING_WRITE>)

Typically the AXI4 initiator interface is connected to the AXI4 target (or AXI4 slave) interface of a memory block like DDR. The external logic would need to inform the HLS module where in the memory block to access the data referenced by the pointer argument. This is the same concept as a pointer argument of a software function that runs on a processor --- consider invoking a function void foo(int *ptr_arg) in software, the ptr_arg is essentially the address in processor memory for function foo to access. Hence each argument using AXI4 initiator interface is associated with a pointer address interface for specifying the base address in the connected memory block that the HLS module will access. The pointer address interface can be configured via the ptr_addr_interface option with two modes, simple and axi_target.

  • For simple: the HLS module will have a simple input port with the same name as the pointer argument. The external logic should set the base pointer address on the input port before the HLS module starts.
  • For axi_target: the HLS module will create a register behind the AXI4 target interface. The external logic should use AXI4 write transaction to set the base pointer address via the AXI4 target interface. The address offset of the register can be found in the 3.5.1.23.1.5 AXI4 Target Interface Address Map section of the 3.5.1.23.1 SmartHLS Report. A driver function will be generated to set the register (see ref:3.5.1.19.4 AXI4 Initiator Argument's Pointer Address Driver Functions).
  • The default ptr_addr_interface is simple; but when the default-to-axi_target interface pragma is used, the default ptr_addr_interface becomes axi_target.

To illustrate further about pointer address, consider the following example,

void incr(char *ptr) {
    #pragma HLS interface argument(ptr) type(axi_initiator) ptr_addr_interface(axi_target)
    for (int i = 0; i < 10; i++) {
        *ptr = *ptr + 3;
        ptr++;  // Increment the pointer by 1 byte.
    }
}

In this example, when we start the incr hardware block, we need to pass the base pointer address using the AXI4 target interface, as specified by the ptr_addr_interface(axi_target). Assuming we write a base address of 0xFFFF0000 to the AXI4 target interface, then this module will read the data, increment by 3, and write back new data, for each of the ten char from memory address 0xFFFF0000 to 0xFFFF0009.

The num_elements option is only available for array type arguments. The array size can be specified or overridden (over the declared size in C++) by specifying the num_elements option. This option is needed by the 3.5.1.7 Simulate HLS Hardware (SW/HW Co-Simulation) feature to know the size of the external memory to be modelled in the simulation testbench; this option does not affect the HLS-generated RTL/circuit.

Important:

AXI4 initiator burst support is currently a beta feature in SmartHLS, and is under active development. Data transfer throughput will be improved and additional features will be added in subsequent SmartHLS releases.

By default, each read or write to an AXI4 initiator pointer argument in the C++ will become a non-burst AXI4 transaction. However, when the read or write is inside a loop, SmartHLS can detect when a burst transaction can be used instead, and SmartHLS will combine the reads or writes that occur over multiple loop iterations into an AXI4 burst transaction. In order to use a loop to infer an AXI4 burst transfer, the loop should have the following properties:

  • The loop should be pipelined (see 3.5.1.8 Loop Pipelining).
  • The loop bound should be known before the loop executes.
  • The loop should have no more than 1 read and no more than 1 write to AXI4 initiator pointer arguments.
  • The addressing for the reads and writes should be incrementing by 1 word per loop iteration.
  • The reads and writes should not be inside conditional statements.
If the user expects a burst read or write to an AXI4 initiator pointer argument, they can use the max_burst_len option in the interface pragma to specify the burst length to use when doing a burst read or write transaction for that argument. If the max_burst_len is unspecified, SmartHLS will use a default value of 16. An example of using a loop that meets all the criteria to infer an AXI4 initiator burst is shown below:
#define NUM_ELEMENTS 1000

void init_array(int *out_array) {
#pragma HLS function top
// In the interface argument, set the max burst length to 64
#pragma HLS interface argument(out_array) type(axi_initiator)                  \
    ptr_addr_interface(axi_target) num_elements(NUM_ELEMENTS)                  \
        max_burst_len(64)
// Burstable
#pragma HLS loop pipeline
    for (unsigned idx = 0; idx < NUM_ELEMENTS; ++idx) {
        out_array[idx] = idx;
    }
}
An example of loops that do not meet the burst criteria (and will produce warnings if pipelined) is shown below:
#define NUM_ELEMENTS 1000

void init_array(int *out_array) {
#pragma HLS function top
// In the interface argument, set the max burst length to 64
#pragma HLS interface argument(out_array) type(axi_initiator)                  \
    ptr_addr_interface(axi_target) num_elements(NUM_ELEMENTS)                  \
        max_burst_len(64)

// Not burstable - two writes per iteration
#pragma HLS loop pipeline
    for (unsigned idx = 0; idx < NUM_ELEMENTS / 2; ++idx) {
        out_array[idx] = idx;
        out_array[idx * 2] = idx + 1;
    }

// Not burstable - address is not incrementing by one
#pragma HLS loop pipeline
    for (unsigned idx = 0; idx < NUM_ELEMENTS / 2; ++idx) {
        out_array[idx * 2] = idx + 1;
    }

// Not burstable - write is inside conditional code
#pragma HLS loop pipeline
    for (unsigned idx = 0; idx < NUM_ELEMENTS; ++idx) {
        if (idx % 10)
            out_array[idx] = idx;
    }
}

SmartHLS also supports allowing multiple AXI4 Initiator burst requests to be left outstanding without stalling the accelerator. By sending more burst requests in advance, the AXI4 target can have more time to respond, potentially reducing the time the SmartHLS accelerator needs to wait. Using this feature infers internal FIFOs in the design, of size max_ouststanding_<reads/writes> * addr_size for the ARADDR and AWADDR channels and max_outstanding_<reads/writes> * max_burst_len * word_size for RDATA and WDATA channels. The max_burst_len and max_outstanding_<reads/writes> can be set separately for input and output AXI4 Initiator arguments, allowing the user to optimize their read and write burst transactions separately. An example of how to use the interface pragmas to specify this behaviour is shown below. In this example, addr_size depends on the AXI4 interface address width (SmartHLS will generate an interface with address width 64), and word_size depends on the argument type (in this case since the argument is an int pointer, the word size is 32.

#define NUM_ELEMENTS 10000
#define USE_OPTIMIZED 1

// Use AXI initiator to copy from one external memory to another.
void copy_array(unsigned *in_array, unsigned *out_array, unsigned num_elements) {
#ifndef USE_OPTIMIZED
#pragma HLS function top
#pragma HLS interface default type(axi_target)
#pragma HLS interface argument(in_array) type(axi_initiator)                   \
    num_elements(NUM_ELEMENTS)
#pragma HLS interface argument(out_array) type(axi_initiator)                  \
    num_elements(NUM_ELEMENTS)
#endif
    for (unsigned idx = 0; idx < num_elements; ++idx) {
        out_array[idx] = in_array[idx];

The values specified for these interface pragmas can be tuned to optimize the throughput of the AXI4 Initiator interface. For more information on how to pick these values to improve performance, see Optimizing AXI4 Initiator Performance. For more information on the semantics of the interface pragmas themselves, see the relevant Pragma Guide section: 3.6.1.16 AXI4 Initiator Interface for Pointer Argument . If finer control of the burst transfer is required, consider 3.5.1.18.4.1 Implementing A Custom AXI4 Master/Slave Using hls::FIFO.