3.5.1.19.6 Example Usage of the Driver Functions

Let's use the C++ code below as the example input for SmartHLS to synthesize,

// Top-level function that is to be synthesized into a hardware accelerator.
int test_function(uint8_t input_array[10], uint8_t output_array[10], uint8_t &status_code) {
    #pragma HLS function top
    #pragma HLS interface default type(axi_target)
    // Accelerator implementation
    ...
}

// Original user code that calls the top-level function.
uint8_t input_array[10] = ...
uint8_t output_array[10];
uint8_t current_status;
int result = test_function(input_array, output_array, &current_status);
...

We will show below several ways to use the driver functions to invoke the HLS accelerator.

  • Invoke the HLS accelerator using a single top-level driver function that replaces the original call of the software implementation:
    uint8_t input_array[10] = ...
    uint8_t output_array[10];
    uint8_t current_status;
    
    // Invoke the hardware accelerator in a single (blocking) call.
    int result = test_function_hls_driver(input_array, output_array, &current_status);
    
  • If the software runs on the Icicle kit with a Linux Operating System, and an argument is configured to use DMA transfer in the top-level driver functions, by specifying dma(true) option in the argument's interface pragma, then the argument needs to be allocated by a hls_maloc call:
    #include "hls/hls_alloc.h"
    uint8_t input_array[10] = (uint8_t *) hls_malloc(10);
    uint8_t output_array[10] = (uint8_t *) hls_malloc(10);
    uint8_t current_status; // If an argument doesn't have dma(true), then it doesn't need to use hls_malloc()
    
    // Invoke the hardware accelerator in a single (blocking) call.
    int result = test_function_hls_driver(input_array, output_array, &current_status);
    
    hls_free(input_array);
    hls_free(output_array);
  • Invoke and join the HLS accelerator in two steps. Each step is blocking, but other code can run between steps:
    uint8_t input_array[10] = ...
    uint8_t output_array[10];
    uint8_t current_status;
    
    // Transfer input arguments and start the HLS accelerator.
    test_function_write_input_and_start(input_array);
    
    // Execute other software code on the processor here while the HLS accelerator is running.
    ...
    
    // While the accelerator is running, single-element memory (passed in by
    // reference or by pointer) can be read
    test_function_memcpy_read_status_code(&current_status, 1);
    
    // Later, poll for HLS accelerator finish and read output/return value
    int result = test_function_join_and_read_output(output_array, &current_status);
  • Use each input/output argument's driver function to transfer input/output, and use module control functions to start and check finish for the HLS accelerator:
    uint8_t input_array[10] = ...
    uint8_t output_array[10];
    uint8_t current_status;
    
    // Transfer input arguments, using memcpy transfer method for arrays.
    test_function_memcpy_write_input_array(input_array, 10);
    
    // Start the accelerator
    test_function_start();
    
    // Execute other software code on the processor here while the HLS accelerator is running.
    ...
    
    // Poll for accelerator finish, read return value.
    int result = test_function_join();
    
    // Transfer output arguments, using memcpy transfer method for arrays.
    test_function_memcpy_read_output_array(output_array, 10);
    test_function_memcpy_read_status_code(&current_status, 1);
  • Same as above, but uses DMA transfer instead of memcpy, and assume the software runs on top of a Linux Operating System on the Icicle kit:
    // Use DMA to transfer array arguments, need to be in a space allocated by a hls_maloc call.
    #include "hls/hls_alloc.h"
    uint8_t input_array[10] = (uint8_t *) hls_malloc(10);
    uint8_t output_array[10] = (uint8_t *) hls_malloc(10);
    uint8_t current_status; // dma(false)
    
    // Transfer input arguments, using DMA transfer method for arrays.
    test_function_dma_write_input_array(input_array, 10);
    
    // Start the accelerator
    test_function_start();
    
    // Execute other software code on the processor here while the HLS accelerator is running.
    ...
    
    // Poll for accelerator finish, read return value
    int result = test_function_join();
    
    // Transfer output arguments, using DMA transfer method for arrays.
    test_function_dma_read_output_array(output_array, 10);
    test_function_dma_read_status_code(&current_status, 1);
    ...
    
    hls_free(input_array);
    hls_free(output_array);