3.5.2.4 Memory Partitioning

Memory Partitioning is an optimization where aggregate types such as arrays and structs are partitioned into smaller pieces allowing for a greater number of reads and writes (accesses) per cycle. SmartHLS instantiates a RAM for each aggregate type where each RAM has up to two ports (allowing up to two reads/writes per cycle). Partitioning aggregate types into smaller memories or into its individual elements allows for more memory accesses per cycle and improves memory bandwidth.

More information about memory partitioning is described in 3.5.1.14 Memory Partitioning.

Memory partitioning can be used to gain performance when there are pipelines with high II due to memory contention.

int data[MAX_ITER];

...

#pragma HLS loop pipeline
for (iter = 0; iter < (MAX_ITER-3); iter++) {
    long long x0 = data[iter];
    long long x1 = data[iter+1];
    long long y0 = data[iter+2];
    long long y1 = data[iter+3];
    result += x0 * x1 + y0 * y1;
}

In the pipelined code above, there are four accesses to the array data per cycle but only two ports are available for the memory. All four accesses of one iteration must finish before the next loop iteration can start. With two accesses possible per cycle and four accesses required per iteration, the II for the pipeline will be 2. For more information on pipelines and II, please refer to the 3.5.2.1 Loop Pipelining section.

To help improve the memory bandwidth, SmartHLS automatically analyzes the access patterns of each memory and attempts to split the memory into separate partitions if possible. In the above example, automatic access-based partitioning will not be able to partition the data array as the four accesses in the loop overlap and span the entire memory. In this case, user-specified partitioning would be ideal to partition memories and eliminate memory contention, allowing the pipeline to achieve an II of 1.

#pragma HLS memory partition variable(data)
int data[MAX_ITER];

...

#pragma HLS loop pipeline
for (iter = 0; iter < (MAX_ITER-3); iter++) {
    long long x0 = data[iter];
    long long x1 = data[iter+1];
    long long y0 = data[iter+2];
    long long y1 = data[iter+3];
    result += x0 * x1 + y0 * y1;
}

In the above snippet, the array data is specified to be completely partitioned. By default, the partition dimension is set to 0. This means the memory will be partitioned into individual elements along the right-most dimension. The partition dimension can be changed with the optional dim parameter (see Partition Memory). SmartHLS will analyze the ranges of each accessing instruction to the array and create partitions only for the accessed elements. Unaccessed partitions are discarded. For user-specified partitioning, the accesses will be modified to be access the correct partition based on the index at runtime.

When applied to memories with high memory contention in deep pipelines, memory partition has the potential to greatly impact circuit performance by reducing pipeline II.