3.5.1 User Guide

3.5.1.1 Introduction to High-Level Synthesis

High-level synthesis (HLS) refers to the synthesis of a hardware circuit from a software program specified in a high-level language, where the hardware circuit performs the same functionality as the software program. For SmartHLS™, the input is a C/C++-language program, and the output is a circuit specification in the Verilog hardware description language. The SmartHLS-generated Verilog can be given to Libero to be programmed on a Microchip FPGA. The underlying motivation for HLS is to raise the level of abstraction for hardware design, by allowing software methodologies to be used to design hardware. This can help to shorten design cycles, improve design productivity and reduce time-to-market.

While a detailed knowledge of HLS is not required to use SmartHLS, it is worthwhile to highlight the key steps involved in converting software to hardware. The four main steps involved in HLS are allocation, scheduling, binding, and RTL generation, which runs one after another (for example, binding runs after scheduling is done).

Allocation: The allocation step defines the constraints on the generated hardware, including the number of hardware resources of a given type that are used (for example, how many divider units are used, the number of RAM ports and so on), as well as the target clock period for the hardware, and other user-supplied constraints.

Scheduling: Software programs are written without any notion of a clock or finite state machine (FSM). The scheduling step of HLS bridges this gap, by assigning the computations in the software to occur in specific clock cycles in hardware. With the user-provided target clock period constraint (for example, 10 ns) scheduling will assign operations into clock cycles such that the operations in each cycle does not exceed the target clock period, in order to meet the user constraint. In addition, the scheduling step will ensure that the data-dependencies between the operations are met.

Binding: While a software program contains an arbitrary number of operations of a given type (for example, multiplications), the hardware contains only a limited number of units capable of performing such a computation. The binding step of HLS is to associate (bind) each computation in the software with a specific unit in the hardware.

RTL generation: Using the analysis from the previous steps, the final step of HLS is to generate a description of the circuit in a hardware description language (Verilog).

Executing computations in hardware brings speed and energy advantages over performing the same computations in software running on a processor. The underlying reason for this is that the hardware is dedicated to the computational work being performed, whereas a processor is generic and has the inherent overheads of fetching/decoding instructions, loading/storing from/to memory, etc. Further acceleration is possible by exploiting hardware parallelism, where computations can concurrently. With SmartHLS, one can exploit four styles of hardware parallelism, which are instruction-level, loop-level, thread-level, and function-level parallelism.

Option	Description
modules	Group all the settings for each top-level module
log_level	Specifies what in/out ports of the module get instrumented: 0: No instrumentation. 1: The tool instruments all simple ports and only AXI handshaking signals (i.e. valid/ready). Tries to reduce resource overhead when using AXI interfaces. 2: (Default) the tool instruments all the ports from log level 1, plus the AXI data and address ports. Meant to provide a balanced approach. 3: In log level 3, the tool instruments all the ports of the module. Will require the most resources but provides maximum visibility on all ports. See Table 3-3 for a list of AXI signals that will be instrumented at each log level. By default, this is set to 2 for all modules.
fifo_log_level	Specifies what visibility is desired for all FIFOs in the SmartHLS modules. 0: (Default) No instrumentation. 1: Only FIFO full, empty and occupancy (number of words in the FIFO) are instrumented. 2: Signals included in FIFO_LOG_LEVEL=1 plus FIFO write_enable, read_enable, write_data and read_data. 3: All FIFO signals. This include almost_full, almost_empty, among others. By default, this is set to 0 for all modules.
dashboard	Group all monitoring mode related settings. For more information on the monitoring mode, see Modes.
max_iterations	Specifies how many times the debugger should be run. If set to -1, the debugger will run infinitely. Only applicable when using the monitoring mode. The default value is -1.
show_markers	Toggles whether markers should be drawn in the waveform when a captured sample dump ends. Only applicable when using the monitoring mode. The default value is 1.
monitoring_mode	1: Monitoring mode 0: (Default) Debugging mode See the Monitoring/Debugging modes section for more information.
waveform_period	The waveform period in nanoseconds that Identify will use when writing captured samples to a VCD file. By default, this is set to the period of the HLS module.
iice_options	Group all settings related to the automatically generated IICE
sample_buffer_depth	The depth (number of samples) of the IICE generated for the HLS project. There is one IICE per SmartHLS project. The default value is 1024.
iice_name	The name of the IICE generated for the SmartHLS project. If this is an empty string, the name of the IICE will be IICE_[HLS project folder name]. This is left empty by default.

Log Level	AXI Memory-Map and AXI Stream Signals that will be Instrumented
1	valid, ready
2	addr, data, len, burst, size, strb
3	All the signals

File	Purpose
fifo_list	File with a list of all the FIFOs in the design, as well as their almost_full and almost_empty values.
instrument_config.tcl	Configuration file with information required by both the monitoring script and visualizing scripts.
monitor.tcl	Script that repeatedly runs the debugger for the purpose of monitoring. This should be run by identify_debugger.
prj_<HLS PROJECT_NAME>_hls_identify.tcl	Instrumentation script that opens the Synplify synthesis project, and then instruments the SmartHLS modules by sourcing <HLS PROJECT_NAME>_hls_identify.tcl
<HLS PROJECT_NAME>_hls_identify.tcl	Instrumentation script that creates the IICE for the SmartHLS project. Then it sources all the identify_instrument_<TOP LEVEL MODULE NAME>.tcl files.
identify_instrument_<TOP LEVEL MODULE NAME>.tcl	Script to instrument (add) all the selected signals based on the selected log level for the TOP LEVEL MODULE NAME. One of these files is generated for every top-level module in your SmartHLS project.
update_vcd.tcl	Script used to refresh the waveform viewer with new samples (debugging mode), or append new samples on the existing waveform (monitoring mode).
wave_template.do	Waveform template that automatically groups the instrumented HLS signals. This should be run in your ModelSim waveform viewer.
fifo_dashboard_wave_template.do	Waveform template that arranges all the FIFOs' occupancy (usedw signals) at the bottom of the waveform for easy viewing. This should be run in ModelSim waveform viewer.
vsim_keyboard_binding	Keyboard binding for ModelSim that refreshes the waveform with newly captured signals every time Ctrl+R is pressed for convenience.

Class Method	Description
`FIFO<T> ()`	Create a new FIFO.
`FIFO<T> (unsigned depth)`	Create a new FIFO with the specified depth.
`void write(T data)`	Write `data` to the FIFO.
`T read()`	Read an element from the FIFO.
`bool empty()`	Returns 1 if the FIFO is empty.
`bool full()`	Returns 1 if the FIFO is full.
`void setDepth(unsigned depth)`	Set the FIFO’s depth.

Parameter	Description
W	The width of the word in bits.
I_W	How far the most significant bit is above the decimal. I_W can be negative. I_W > 0 implies the MSB is above the decimal. I_W <= 0 implies the MSB is below the decimal. If W >= I_W >= 0, then I_W is the number of bits used for the integer portion.
Q_M	The Quantization(rounding) mode used when a result has precision below the least significant bit. Defaults to AP_TRN.
	AP_TRN	Truncate bits below the LSB bringing the result closer to -∞.
	AP_TRN_ZERO	Truncate bits below the LSB bringing the result closer to zero.
	AP_RND	Round to the nearest representable value with the mid-point going towards +∞.
	AP_RND_INF	Round to the nearest representable value with the mid-point going towards -∞ for negative numbers, and +∞ for positive numbers.
	AP_RND_MIN_INF	Round to the nearest representable value with the mid-point going towards -∞.
	AP_RND_ZERO	Round to the nearest representable value with the mid-point going towards 0.
	AP_RND_CONV	Round to the nearest representable value with the mid-point going towards the nearest even multiple of the quantum. (This helps to remove bias in rounding).
O_M	The Overflow mode used when a result exceeds the maximum or minimum representable value. Defaults to AP_WRAP.
	AP_WRAP	Wraparound between the minimum and maximum representable values in the range.
	AP_SAT	On positive and negative overflow saturate the result to the maximum or minimum value in the range respectively.
	AP_SAT_ZERO	On any overflow set the result to zero.
	AP_SAT_SYM	On positive and negative overflow saturate the result to the maximum or minimum value in the range symmetrically about zero. For ap_ufixpt this is the same as AP_SAT.

Type	Quantum	Range	AP_SAT_SYM Range
ap_ufixpt	2^(I_W - W)	0 to 2^(I_W) - Q	0 to 2^(I_W) - Q
ap_fixpt	2^(I_W - W)	-2^(I_W - 1) to 2^(I_W - 1) - Q	-2^(I_W - 1) + Q to 2^(I_W - 1) - Q

Type	Quantum	Range
ap_fixpt<8, 4>	0.0625	-8 to 7.9375
ap_ufixpt<4, 12>	256	0 to 3840
ap_ufixpt<4, -2>	0.015625	0 to 0.234375

Type	Operator	Description	ap_[u]int	ap_[u]fixpt	floating
Arithmetic	+	Addition	Y	Y	Y
	-	Subtraction	Y	Y	Y
	*	Multiplication	Y	Y	Y
	/	Division	Y	Y	Y
	%	Modulo	Y	Y	Note Below
	++	Increment	Y	Y	Y
	–	Decrement	Y	Y	Y
Assignment	=	Assignment	Y	Y	Y
	+=	Add and assign	Y	Y	Y
	-=	Sub and assign	Y	Y	Y
	*=	Mult and assign	Y	Y	Y
	/=	Div and assign	Y	Y	Y
	%=	Mod and assign	Y	Y	Note Below
	&=	bitwise AND and assign	Y	Y	N/A
	\|=	Bitwise OR and assign	Y	Y	N/A
	^=	Bitwise XOR and assign	Y	Y	N/A
	>>=	SHR and assign	Y	Y	N/A
	<<=	SHL and assign	Y	Y	N/A
Comparison	==	Equal to	Y	Y	Y
	!=	Not equal to	Y	Y	Y
	>	Greater than	Y	Y	Y
	<	Less than	Y	Y	Y
	>=	Greater than or equal to	Y	Y	Y
	<=	Less than or equal to	Y	Y	Y
Bitwise	&	Bitwise AND	Y	Y	N/A
	^	Bitwise XOR	Y	Y	N/A
	\|	Bitwise OR	Y	Y	N/A
	~	Bitwise Not	Y	Y	N/A
	.or_reduce()	Bitwise OR reduction	Y	Y	N/A
Shift	<<	Shift left	Y	Y	N/A
	>>	Shift right (Signed: `ashr`, unsigned: `lshr`)	Y	Y	N/A
	.lshr(ap_uint numbits)	Logical shift right	Y	Y	N/A
	.ashr(ap_uint numbits)	Arithmetic shift right	Y	Y	N/A
Bit level access	num(a, b)	Range selection	Y	Y	N/A
	num.range(a, b)	Range selection	Y	Y	N/A
	num[a]	Bit selection	Y	Y	N/A
	num.byte(n, s = 8)	Select `n-th` byte with `s` bits per byte	Y	Y	N/A
	num.bytes(m, n, s = 8)	Select `m-th` to `n-th` byte (inclusive) with `s` bits per byte	Y	Y	N/A
	(numa, numb, numc)	Concat	Y	Y	N/A
Explicit Conversion	.to_ufixpt()	Convert to ap_ufixpt	Y	N/A	N/A
	.to_fixpt()	Convert to ap_fixt	Y	N/A	N/A
	.to_uint64()	Convert to uint64	Y	N/A	N/A
	.to_int64()	Convert to int64	Y	N/A	N/A
	.raw_bits()	Convert to raw bits	N/A	Y	N/A
	.from_raw_bits()	Convert from raw bits	N/A	Y	N/A
	.to_double()	Convert to double	N/A	Y	N/A
String Conversion	.to_fixpt_string()	Convert to fixpt string	N/A	Y	N/A
String Conversion	.to_string()	Convert to int string	Y	Y	N/A

Producer Side Methods	Description
`T &producer()`	Returns a reference to the buffer that should be used exclusively by the producer function. The reference stays unchanged throughout the entire lifetime of the buffer object. Hence the producer function typically only needs to call this method once. Although the producer function is meant to store output to the buffer, the producer function can still read back the self-written data via the reference.
`void producer_acquire()`	Acquires a buffer for producer to store the output. After this function returns, the producer may start writing to the buffer. This method is a blocking call – if there is an available buffer, the method returns immediately; otherwise the method blocks until a buffer becomes available, after the consumer side calls the `consumer_release()` method to release a buffer. Initially all buffers (2 for `DoubleBuffer`, 1 for `SharedBuffer`) are available for the producer function to acquire.
`void producer_release()`	Releases the previously acquired buffer after finish writing output. The released buffer can then be acquired by the consumer to access as input. This method is not a blocking call and returns immediately. If the producer does not have an acquired buffer when calling this release method, the method simply returns with no operation, no buffer will be released.
Consumer Side Methods
`T &consumer()`	Returns a reference to the buffer that should be used exclusively by the consumer function. The reference stays unchanged throughout the entire lifetime of the buffer object. Hence the consumer function typically only needs to call this method once. Although the consumer function is meant to read the buffer as input, the consumer function can also write data to the buffer via the reference. However the data written by the consumer function won’t be visible to the producer function.
`void consumer_acquire()`	Acquires a producer-released buffer in the `DoubleBuffer` or `SharedBuffer` for the consumer to access as input. This method is a blocking call – if there is a producer-released buffer available, the method returns immediately; otherwise the method blocks until a buffer becomes available, after the producer side calls the `producer_release()` method to release a buffer.
`void consumer_release()`	Releases the previously acquired buffer after finish reading input. The released buffer is returned back to the producer side for producer to write the next set of data. This method is not a blocking call and returns immediately. If the consumer does not have an acquired buffer when calling this release method, the method simply returns with no operation, no buffer will be released.

Class Method	Description
`int sb_count()`	Return the number of single-bit error detected and corrected.
`int db_count()`	Return the number of double-bit error detected.
`void reset_counters()`	Reset both single-bit and double-bit counters to 0.
`void scrub()`	Read the memory element-by-element and write them back. Note that `scrub()`automatically calls `reset_counters()` to reset the error counters.

FPGA Vendor	Device	Default Clock Frequency (MHz)	Default Clock Period (ns)
Microchip	PolarFire®	100	10
Microchip	SmartFusion2	100	10

Function Types		Function Names
Trigonometric functions		cos, cosf, sin, sinf, tan, tanf, acos, acosf, asin, asinf, atan, atanf, atan2
Hyperbolic functions		cosh, coshf, sinh, sinhf, tanh, tanhf, acosh, acoshf, asinh, asinhf, atanh, atanhf
Exponential and logarithmic functions		exp, expf, frexp, log, logf, log10, modf, exp2, expm1, ilogb, log1p, log2, logb, scalbn, scalbln
Power functions		pow, powf, sqrt, hypot, cbrt
Error and gamma functions		erf, erfc, tgamma
Rounding and remainder functions		ceil, floor, fmod, fmodf, trunc, round, lround, llround, rint, lrint, llrint, nearbyint, remainder, remquo
Floating-point manipulation functions		copysign, nan, nextafter
Minimum, maximum, difference functions		fdim, fmax, fmin
Other functions		fabs, fabsf, fma
Implemented as macros in C and as functions in C++	Classification macros or functions	isinf, isnan
Implemented as macros in C and as functions in C++	Comparison macros or functions	isgreater, isgreaterequal, isless, islessequal, islessgreater

Region	Address	Size (bytes)	Description
HLS_ALLOC_CACHED	0xae000000	0x02000000	Cached DDR. Default if region unspecified. Recommended for best overall transfer times.
HLS_ALLOC_NONCACHED_WCB	0xd8000000	0x08000000	Non-cached DDR with write-combine buffer. Slightly better performance than Cached DDR for writes, but worse for reads.
HLS_ALLOC_NONCACHED	0xc0000000	0x08000000	Non-cached DDR. Not recommended (lower performance than other options).

Data Type	Interface Type
Data Type	Memory (default)	AXI4 Initiator	AXI4 Target	Legacy AXI4 Slave
Array	Pointer arguments and global variables of these data types are supported.	Pointer arguments of these data types are supported, but not for global variables.		n/a
Struct				Supports global struct only.
Scalar				n/a

Class Method	Description
`ECC_RAM<data_type, depth, SB_WRITE_BACK, DB_OVERRIDE, DB_DEFAULT>()`	Create a new ECC RAM with the specified parameters.
`operator[i]`	Read/write data at index `i` and handle errors based on the configuration.
`bool read(i, d_i)`	Read data at index `i` and save the read data from memory into `d_i`. Return `true` if the data is correct, `false` otherwise.
`int sb_count()`	Return the number of single-bit error detected and corrected.
`int db_count()`	Return the number of double-bit error detected.
`void reset_counters()`	Reset both single-bit and double-bit counters to 0.
`void scrub()`	Read the memory element-by-element and write back them back. Note that `scrub()`automatically calls `reset_counters()` to reset the error counters.
`void inject_error(addr, mask)`	Inject error at the given address based on the masking bits.

Port Name	Direction	Description
clock	IN	The input clock signal to the RTL module.
reset	IN	The input reset signal to the RTL module.
ready	OUT	Indicates the readiness of the RTL module. ready is set to 1 when the RTL module is ready to start a new iteration (invocation) with a new set of inputs.
start	IN	When ready is 1, setting start to 1 will start the execution of the RTL module; When ready is 0, the start signal is ignored by the RTL module.
finish	OUT	finish is set to 1 for one clock cycle when the RTL module finishes.
return_val	OUT	Holds the valid return value when finish is asserted. This signal does not exist if the top-level function has a void return type.

Port Name	Direction	Description
<ARG_NAME>_address_<a\|b>	OUT	The address pointing to the RAM entry that SmartHLS module wants to access.
<ARG_NAME>_read_en_<a\|b>	OUT	Read enable port (n/a for write-only memory).
<ARG_NAME>_read_data_<a\|b>	IN	Read data port (n/a for write-only memory).
<ARG_NAME>_write_en_<a\|b>	OUT	Write enable port (n/a for read-only memory).
<ARG_NAME>_byte_en_<a\|b>	OUT	Byte-enable port. Only available if the memory requires writes to partial bytes of a memory word. (n/a for read-only memory, or when all write operations update the whole memory words).
<ARG_NAME>_write_data_<a\|b>	OUT	Write data port (n/a for read-only memory).

Port Name	Direction	Description
<ARG_NAME>_read_data	IN	The input value of the argument (n/a for write-only memory). The signal is not sampled at the start of circuit execution. The external logic needs to keep the signal stable and valid at any given time during the circuit execution.
<ARG_NAME>_write_data	OUT	The output value of the argument (n/a for read-only memory). The write_data port has valid value only when the write_en signal is high. This port is not available if the SmartHLS circuit never writes to the pointer argument (or global variable).
<ARG_NAME>_write_en	OUT	Indicates the write_data is valid (n/a for read-only memory). This port is not available if the SmartHLS circuit never writes to the pointer argument (or global variable).

Template Parameter		Port Name
`T` is a scalar data type (`pack` is ignored)		<ARG_NAME><ARG_NAME>_valid<ARG_NAME>_ready
`T` is a struct of scalars, e.g., struct MyAxiStream { ap_uint<32> data; ap_uint<8> keep; ap_uint<1> last; };	`pack` = false	<ARG_NAME>_data<ARG_NAME>_keep<ARG_NAME>_last<ARG_NAME>_valid<ARG_NAME>_ready
	`pack` = true	<ARG_NAME> // 41-bit wide.<ARG_NAME>_valid<ARG_NAME>_ready

Function Signature	Description
`void * <TopFunc>_setup(uint32_t base_addr = <TopFunc>_BASE_ADDR);`	Registers the <TopFunc> module in a list, and maps the physical address <TopFunc>_BASE_ADDR to a virtual memory address, which is returned for further access by other API functions.
`void <TopFunc>_teardown( );`	Deregister the <TopFunc> module and un-map the associated virtual memory address.

Operating System	Function Signature	Description
Linux	`int <TopFunc>_is_idle(void *virt_addr);`	This function returns 1 if the SmartHLS module is idle or has finished the latest invocation.
Baremetal	`int <TopFunc>_is_idle(uint32_t base_addr = <TopFunc>_BASE_ADDR);`
Linux	`void <TopFunc>_start(void *virt_addr);`	This function starts the SmartHLS module. Input arguments including the module's memory-mapped virtual address are expected to have been set before this function is called.
Baremetal	`void <TopFunc>_start(uint32_t base_addr = <TopFunc>_BASE_ADDR);`
Linux	`RETYPE <TopFunc>_join(void *virt_addr);`	This is a blocking function that waits for the completion of the HLS module. The function returns the return value of the SmartHLS function/module (if not void). The RETYPE is a placeholder for the return type of the function.
Baremetal	`RETYPE <TopFunc>_join(uint32_t base_addr = <TopFunc>_BASE_ADDR);`

Operating System	Function Signature	Description
Linux	`void <TopFunc>_write_<ArgName>(TYPE val, void *virt_addr);`	This function writes the value 'val' to the scalar argument <ArgName>. This essentially causes an AXI Memory Map write transaction into the SmartHLS module's on-chip storage.
Baremetal	`void <TopFunc>_write_<ArgName>(TYPE val, uint32_t base_addr = <TopFunc>_BASE_ADDR);`
Linux	`RETYPE <TopFunc>_read_<ArgName>(void *virt_addr);`	This function reads the value of the <ArgName>. This is causes an AXI Memory Map read transaction from the SmartHLS module's on-chip storage.
Baremetal	`RETYPE <TopFunc>_read_<ArgName>(uint32_t base_addr = MYTOPFUNC_BASE_ADDR);`

Operating System	Function Signature	Description
Linux	`void <TopFunc>_memcpy_write_<PtrArg>(void* <PtrArg>, uint64_t byte_size, void *virt_addr);`	These functions perform memory-mapped write/read operations (using the standard `memcpy` function). It is the CPU who copies the data from its memory as pointed to by <PtrArg> and the SmartHLS module's on-chip storage. The total size to transfer is defined by the 'byte_size' argument. These functions do NOT use DMA.
Baremetal	`void <TopFunc>_memcpy_write_<PtrArg>(void* <PtrArg>, uint64_t byte_size, uint32_t base_addr);`
Linux	`void <TopFunc>_memcpy_read_<PtrArg> (void* <PtrArg>, uint64_t byte_size, void *virt_addr);`
Baremetal	`void <TopFunc>_memcpy_read_<PtrArg> (void* <PtrArg>, uint64_t byte_size, uint32_t base_addr);`
Linux	`void <TopFunc>_dma_write_<PtrArg>(void* <PtrArg>, uint64_t byte_size, void *virt_addr);`	These functions perform memory-mapped write/read operations using the DMA engine in the HSS to move data between the CPU's memory at <PtrArg> and the SmartHLS module's on-chip storage. The total size to transfer is defined by the 'byte_size' argument.
Baremetal	`void <TopFunc>_dma_write_<PtrArg>(void* <PtrArg>, uint64_t byte_size, uint32_t base_addr);`
Linux	`void <TopFunc>_dma_read_<PtrArg> (void* <PtrArg>, uint64_t byte_size, void *virt_addr);`
Baremetal	`void <TopFunc>_dma_read_<PtrArg> (void* <PtrArg>, uint64_t byte_size, uint32_t base_addr);`

Operating System	Function Signature	Description
Linux	`void <TopFunc>_write_<PtrArg>_ptr_addr(void* arg_virt_addr, void *virt_addr);`	This function sets the address for <PtrArg> using `arg_virt_addr`. The `virt_addr` argument is the memory-mapped virtual base address of the top-level module. `arg_virt_addr` is a virtual address and internally it will be mapped to a physical address before sending it to the SmartHLS module, which uses that address to access the content of <PtrArg>. When the SmartHLS project's type is set to `Icicle_SoC` (see set_project), the driver is assumed to run on a Linux Operating System and the CPU's memory referenced by the pointer argument <PtrArg> must be allocated using `hls_malloc()` function (see Memory Allocation Library) and released using `hls_free()`.
Baremetal	`void <TopFunc>_write_<PtrArg>_ptr_addr(void* arg_virt_addr, uint32_t base_addr);`	This function sets the address for <PtrArg> using `arg_virt_addr`, and the `base_addr` argument is the physical memory base address of the top-level module.

v2024.2

3.5.1 User Guide

3.5.1.1 Introduction to High-Level Synthesis

3.5.1.1.1 Instruction-level Parallelism

3.5.1.1.2 Loop-level Parallelism

3.5.1.1.3 Thread-level Parallelism

3.5.1.1.4 Data Flow (Streaming) Parallelism

3.5.1.2 SmartHLS Overview

3.5.1.3 SmartHLS SoC Flow

3.5.1.4 SmartHLS Pragmas

3.5.1.5 SmartHLS Constraints

3.5.1.6 Specifying the Top-level Function

3.5.1.7 Simulate HLS Hardware (SW/HW Co-Simulation)

3.5.1.8 Automatic On-Chip Instrumentation

3.5.1.8.1 Introduction

Configuration File

3.5.1.8.2 Modes

Debugging Mode

Monitoring Mode

3.5.1.8.3 Instrumentation Flows

Using Instrumentation in the SmartHLS SoC Flow

Using a Custom Instrumentation Flow

User-defined instrumentation script

3.5.1.8.4 Generated Files

3.5.1.9 Loop Pipelining

3.5.1.10 Loop Dependence

3.5.1.11 Function Pipelining

3.5.1.12 Data Flow Parallelism

3.5.1.12.1 Dataflow Example: Canny with FIFOs

3.5.1.12.2 Dataflow Example: Diamond

3.5.1.13 Multi-threading with SmartHLS Threads

3.5.1.14 Supported HLS Thread APIs

3.5.1.15 Data Flow Parallelism with SmartHLS Threads

3.5.1.15.1 Further Throughput Enhancement with Loop Pipelining

3.5.1.16 Memory Partitioning

3.5.1.16.1 Access-Based Memory Partitioning

3.5.1.16.2 User-Specified Memory Partitioning

Block Partitioning

Cyclic Partitioning

Complete Partitioning

Struct-Fields Partitioning

3.5.1.17 Struct Support

Example

3.5.1.17.1 Struct Packing

3.5.1.17.2 Struct Partitioning

3.5.1.17.3 Return Struct By Value

3.5.1.17.4 Default Struct Modes

3.5.1.17.5 Limitations

3.5.1.18 Error Correction Code

3.5.1.18.1 SmartHLS™ ECC Library

3.5.1.18.2 Error Emulation

3.5.1.18.3 Error Simulation

3.5.1.18.4 ECC Support Status

3.5.1.19 SmartHLS C/C++ Library

3.5.1.19.1 Streaming Library

Streaming Library - Blocking Behaviour

Streaming Library - Non-Blocking Behaviour

3.5.1.19.2 C++ Arbitrary Precision Data Types Library

3.5.1.19.3 C++ Arbitrary Precision Integer Library

Printing Arbitrary Precision integers

Initializing Arbitrary Precision integers

C++ Arbitrary Precision Integer Arithmetic

C++ Arbitrary Precision Integer Explicit Conversions

3.5.1.19.4 C++ Arbitrary Precision Bit-level Operations

Selecting and Assigning to a Range of Bits

Bit Concatenation

3.5.1.19.5 C++ Arbitrary Precision Fixed Point Library

Printing ap_[u]fixpt Types

Initializing ap_[u]fixpt Types

Arithmetic With ap_[u]fixpt Types

Explicit Conversions of ap_[u]fixpt

3.5.1.19.6 Supported Operations in ap_[u]int, ap_[u]fixpt, and floating-point

3.5.1.19.7 C++ Double Buffer and Shared Buffer

Using DoubleBuffer and SharedBuffer in a multi-threaded dataflow design

3.5.1.19.8 Image Processing Library

Line Buffer

Error Correction Code for Line Buffer

3.5.1.19.9 Standard C Math Library (math.h)

3.5.1.19.10 Standard C Library Assertions (assert.h)

User Example

Using `DoubleBuffer` and `SharedBuffer` in a multi-threaded dataflow design

Operating System	Function Signature	Description
Linux	`RETYPE <TopFunc>_hls_driver(..., uint32_t base_addr = <TopFunc>_BASE_ADDR);`	This function initializes all input argument data, starts the SmartHLS module, waits for its completion, and retrieves the output argument data and return value of the function. It can be used as a direct replacement to the original top-level function, and has the same arguments and return type as the top-level function.
Baremetal
Linux	`void <TopFunc>_write_input_and_start(..., void *virt_addr);`	This function initializes all input argument data and starts the SmartHLS module. It is a non-blocking call that can be used to start the SmartHLS module and continue to execute other parts of the software while the SmartHLS module is running. The arguments of this function include the input arguments of the top-level function. When DMA is used for Baremetal, the physical base address of the Soft-DMA core IP (`DMA_ADDR_<HLS_PROJ_NAME>`) is also passed as an argument.
Baremetal	Without DMA: `void <TopFunc>_write_input_and_start(..., uint32_t base_addr = <TopFunc>_BASE_ADDR);`
Baremetal	With DMA: `void <TopFunc>_write_input_and_start(..., uint32_t base_addr = <TopFunc>_BASE_ADDR, uint32_t dma_addr = DMA_ADDR_<HLS_PROJ_NAME>);`
Linux	`RETYPE <TopFunc>_join_and_read_output(..., void *virt_addr);`	This blocking function waits for the SmartHLS module to finish the execution, and retrieves output argument data and return value (if not void). The arguments are the same arguments of the top-level function. When DMA is used for Baremetal, the physical base address of the Soft-DMA core IP (`DMA_ADDR_<HLS_PROJ_NAME>`) is also passed as an argument.
Baremetal	Without DMA: `RETYPE <TopFunc>_join_and_read_output(..., uint32_t base_addr = <TopFunc>_BASE_ADDR);`
Baremetal	With DMA: `RETYPE <TopFunc>_join_and_read_output(..., uint32_t base_addr = <TopFunc>_BASE_ADDR, uint32_t dma_addr = DMA_ADDR_<HLS_PROJ_NAME>);`

Macro Name	Description
DMA_ADDR_<HLS_PROJ_NAME>	The physical address of the Soft-DMA IP core. This definition is only generated if the SmartHLS project includes the Soft-DMA IP core, for example, when the project type is MiV_SoC. When using the RISC-V processors in the HSS, SmartHLS will use the built-in hardened DMA engine in the HSS.
<TopFunc>_BASE_ADDR	The physical base address of the SmartHLS module <TopFunc>. This address is automatically computed by SmartHLS for each top-level function in the HLS project if using the SmartHLS SoC Flow.
<TopFunc>_SPAN_ADDR	The address span or address space size required for the SmartHLS module <TopFunc>. The address space required for a given SmartHLS module depends on the number and type of arguments in the C++ top-level function.

Parameter name	Default value	Description
SOC_BD_NAME	FIC_0_PERIPHERALS	The name of the SmartDesign project into which the SmartHLS IP modules will be integrated.
SOC_DMA_ENGINE	HARD_DMA	Determines the type of DMA engine to use. HARD_DMA: will use the DMA available in the PolarFire SoC MSS. This implies that the CPUs to be used are the RISC-V U54 application cores in the MSS. SOFT_DMA: will automatically instantiate a DMA engine on the FPGA fabric and be connected to the SmartHLS AXI interconnect along with the SmartHLS IP modules. Currently this option is not enabled, but will be in a future release.
SOC_AXI_INITIATOR	AXI2AXI_TO_HLS:AXI4_MASTER	Identifies the downstream AXI interface to use. This is used for register control and any data write and read transfers initiated by the CPU down to the SmartHLS IP modules.
SOC_AXI_TARGET	AXI2AXI_FROM_HLS:AXI4_SLAVE	Identifies the upstream AXI interface to use. This is used for write and read transfer requests issued by the SmartHLS IP modules targeting the CPU memory.
SOC_RESET	ARESETN	Identifies the reset signal to be used. Important: reset polarity must be active-High.
SOC_CLOCK	ACLK	Identifies the clock to use for the SmartHLS IP modules. Currently, the same clock is used for all modules.
SOC_FABRIC_BASE_ADDRESS	0x70000000	This is the base address of a memory window in the CPU memory address space that is reserved for all SmartHLS modules instantiated on the FPGA fabric. Control registers and on-chip memory buffers are allocated and mapped from this memory window. This address is also used to configure the HLS AXI interconnect to allow AXI transactions to move downstream from the CPU towards the SmartHLS IP modules. Important: The address is hexadecimal value and the ‘0x’ prefix must be included.
SOC_FABRIC_SIZE	0x400000	Determines the size of the memory window used for mapping control registers and on-chip buffers for ALL modules in a given SmartHLS project instantiated on the fabric. The size can be larger than what a specific function may need. For example, a 4MB memory window could be reserved but the IP module may only use half ot if, leaving the other half for future use. Reserving a larger window does not mean more on-chip memory will be used. Important: The size is a hexadecimal value and the ‘0x’ prefix must be included.
SOC_CPU_MEM_BASE_ADDRESS	0x80000000	This base address identifies the beginning of a memory window in the CPU physical memory address space that the SmartHLS IP modules can use when they are AXI Initiators. This address is used to configure the HLS AXI interconnect and allow transactions to move upstream towards the CPU’s memory. Important: The size is a hexadecimal value and the ‘0x’ prefix must be included.
SOC_CPU_MEM_SIZE	0x60000000	This is size of the CPU memory window used when the SmartHLS IP modules act as AXI initiators. Important: The size is a hexadecimal value and the ‘0x’ prefix must be included.
SOC_POLL_DELAY	0	Controls how often the hardware driver polls the modules to check for completion. The value is in microseconds.

Command	Description
soc_base_proj_program	Programs a pre-built base project bit stream to an attached Icicle Kit.
soc_sw_compile_no_accel	Cross-compiles user software (with no accelerators) to a binary targeting the RISC-V processor on PolarFire® SoC.
soc_base_proj_run	Moves the RISC-V binary generated by soc_sw_compile_no_accel to an Icicle Kit on the network and runs the binary on the Icicle kit. Requires the BOARD_IP environment variable to be set.
soc_accel_proj_generate	Generates a reference SoC Libero/SmartDesign project, containing an MSS connected to SmartHLS™-generated hardware accelerators. Please see SmartHLS Reference SoCfor more information.
soc_accel_proj_rtl_synth	Runs RTL synthesis on the generated reference SoC project. Reports resource utilization (see Report Files).
soc_accel_proj_pnr	Runs place and route on the generated reference SoC project. Reports resource utilization and timing result (see Report Files).
soc_accel_proj_program	Programs the generated reference SoC project to an attached Icicle Kit. Requires PROGRAMMER_ID environment variable to be set.
soc_sw_compile_accel	Transforms user software by replacing top-level function calls with calls to the Top-level Driver Functions, and cross-compiles the transformed software to a binary targeting the RISC-V processor on PolarFire SoC.
soc_accel_proj_run	Moves the RISC-V binary generated by soc_sw_compile_accel to an Icicle Kit on the network and runs the binary on the Icicle kit. Requires the BOARD_IP environment variable to be set.
soc_profiler_view	Displays a runtime plot and prints a summary table of the profiling data gathered during runtime execution for each accelerator in the project.

Variable	Description
SRCS	The `SRCS` variable in `Makefile` should list all the source files (`.cpp` or `.c`).Header files should not be added to `SRCS`, but should be properly included by the source files.`shls init` will automatically add the file names for the existing source files in the current directory. If source files are created after `shls init`, please update `SRCS` (e.g., `SRCS = foo.cpp bar.cpp`).
NAME	The `NAME` variable stands for the project name, e.g., `NAME = MY_PROJECT`.The default project name is the current directory name when `NAME` is not specified in `Makefile`.
BOARD_PATH	Specify the path where the test will be run on board. The default is `/home/root/`.`BOARD_PATH` is always prefixed by `/home/root/` as programs are run as root on the board.
PROGRAM_ARGUMENTS	`PROGRAM_ARGUMENTS` can be used to specify the arguments of the software testbench (i.e., `int main(int argc, char *argv[])`).Here is an example: `PROGRAM_ARGUMENTS = input_file.bmp golden_output_file.bmp`. More details can be found in Simulate HLS Hardware (SW/HW Co-Simulation).
NUM_LIBERO_PnR_PASSES	Defines the maximum number of Place and Route passes to go through in order to meet the timing requirement.
INPUT_FILES_RISCV	Specifies the input files to the program to be copied to the development board, separated by a space. The path used for the files should be based on the path on the development local machine. For example:INPUT_FILES_RISCV = lane3.avi lane3_golden.txt`This will copy `lane3.avi` and `lane3_golden.txt` from the project folder to the Icicle board before running the program on board.
OUTPUT_FILES_RISCV	Specifies the output files of the program ran on the development board, separated by a space. The path used for the files should be based on the path on the development board. Example:`OUTPUT_FILES_RISCV = output.avi output.txt` will copy `output.avi` and `output.txt`This will copy `output.avi` and `output.txt` from the the Icicle board to the project folder after running the program on board.
USER_CXX_FLAG	Additional flags used for compilation such as `-I`.Example:`USER_CXX_FLAG = -I$(OPENCV_PATH)/include/opencv4`The above compiler option will be added to all the compile command to include the OpenCV include directory.
USER_CXX_FLAG_RISCV	Additional flags used for compilation such as `-I`.USER_CXX_FLAG_RISCV is defaulted to USER_CXX_FLAG. Defining USER_CXX_FLAG_RISCV will override the default, including when it is defined but empty. User can define this flag for adding specific flags for compiling the binary running on the on-board RISCV processor. Example:`USER_CXX_FLAG_RISCV = -I$(OPENCV_PATH)/include/opencv4`The above compiler option will added to all the compile command to include the OpenCV include directory.
USER_LINK_FLAG	Flags for linking dynamic libraries such as `-L` and `-l`.Example:`USER_LINK_FLAG = -L$(FFMPEG_PATH)/lib -lavcodec`The above compiler option will link FFMPEG’s avcodec library.
USER_LINK_FLAG_RISCV	Flags for linking dynamic libraries such as `-L` and `-l`.USER_LINK_FLAG_RISCV is defaulted to USER_LINK_FLAG. Defining USER_LINK_FLAG_RISCV will override the default, including when it is defined but empty. User can define this flag for adding specific flags for linking the binary running on the on-board RISCV processor. Example:`USER_LINK_FLAG_RISCV = -L$(FFMPEG_PATH)/lib -lavcodec`The above compiler option will link FFMPEG’s avcodec library for the RISCV processor.
USER_ENV_VARS	Set the environment variable used for running the program on development host. Example:`USER_ENV_VARS = LD_LIBRARY_PATH=$(OPENCV_PATH)/lib`The above compiler option will set `LD_LIBRARY_PATH` to `$(OPENCV_PATH)/lib` when running the program. Windows users should specify `PATH` instead of `LD_LIBRARY_PATH` for linking libraries, like so:`USER_ENV_VARS = PATH=$(OPENCV_PATH)/bin`
USER_ENV_VARS_RISCV	Set the environment variable used for running the program on RISCV on the development board. USER_ENV_VARS_RISCV is defaulted to USER_ENV_VARS. Defining USER_ENV_VARS_RISCV will override the default, including when it is defined but empty. Example:`USER_ENV_VARS_RISCV = LD_LIBRARY_PATH=$(OPENCV_PATH)/lib`The above compiler option will set `LD_LIBRARY_PATH` to `$(OPENCV_PATH)/lib` when running the program.
HLS_PATH_SEP	Automatically set to `:` when running on Linux or `;` when running on Windows.
HLS_OS	Automatically set to `linux` when running on Linux or `win` when running on Windows.
HLS_INSTRUMENT_ENABLE	Used to enable instrumentation when using the SmartHLS SoC flow. When set to 1 and `shls soc_accel_proj_pnr` is run, SmartHLS will check if an `instrument_conf.json` file exists. If it does not exist, a new file is created with a default log level of 2 and default FIFO log level of 0 for all modules. See Using a Custom Instrumentation Flow when using SmartHLS IP flow.