3.5.1.17.5 C++ Arbitrary Precision Fixed Point Library

The C++ Arbitrary Precision Fixed Point library provides fast bit accurate software simulation, and efficient equivalent hardware generation. The C++ ap_[u]fixpt types allow specifying signed and unsigned fixed point numbers of arbitrary width, and arbitrary fixed position relative to the decimal. They can be used for arithmetic, concatenation, and bit level operations. You can use the ap_[u]fixpt type by including the following header file.

#include "hls/ap_fixpt.hpp"

The ap_[u]fixpt template allows specifying the width of the type, how far the most significant bit is above the decimal, as well as several quantization and overflow modes.

Quantization and overflow handling is triggered during assignment and construction. The policies used for quantization and overflow are based on the quantization and overflow modes of the left-hand side of an assignment, or of the value being constructed.

The template ap_[u]fixpt<W, I_W, Q_M, O_M> is described in the following table. The last two template parameters are optional.

Parameter Description
W The width of the word in bits.
I_W

How far the most significant bit is above the decimal. I_W can be negative. I_W > 0 implies the MSB is above the decimal. I_W <= 0 implies the MSB is below the decimal.

If W >= I_W >= 0, then I_W is the number of bits used for the integer portion.

Q_M

The Quantization(rounding) mode used when a result has precision below the least significant bit.

Defaults to AP_TRN.

AP_TRN Truncate bits below the LSB bringing the result closer to -∞.
AP_TRN_ZERO Truncate bits below the LSB bringing the result closer to zero.
AP_RND Round to the nearest representable value with the mid-point going towards +∞.
AP_RND_INF Round to the nearest representable value with the mid-point going towards -∞ for negative numbers, and +∞ for positive numbers.
AP_RND_MIN_INF Round to the nearest representable value with the mid-point going towards -∞.
AP_RND_ZERO Round to the nearest representable value with the mid-point going towards 0.
AP_RND_CONV Round to the nearest representable value with the mid-point going towards the nearest even multiple of the quantum. (This helps to remove bias in rounding).
O_M

The Overflow mode used when a result exceeds the maximum or minimum representable value.

Defaults to AP_WRAP.

AP_WRAP Wraparound between the minimum and maximum representable values in the range.
AP_SAT On positive and negative overflow saturate the result to the maximum or minimum value in the range respectively.
AP_SAT_ZERO On any overflow set the result to zero.
AP_SAT_SYM

On positive and negative overflow saturate the result to the maximum or minimum value in the range symmetrically about zero.

For ap_ufixpt this is the same as AP_SAT.

An ap_[u]fixpt is a W bit wide integer, in 2's complement for the signed case, which has some fixed position relative to the decimal. This means that arithmetic is efficiently implemented as integer operations with some shifting to line up decimals. Generally a fixed point number can be thought of as a signed or unsigned integer word multiplied by 2^(I_W - W). The range of values that an ap_[u]fixpt can take on, as well as the quantum that separates those values is determined by the W, and I_W template parameters. The AP_SAT_SYM overflow mode forces the range to be symmetrical about zero for signed fixed point types. This information is described in the following table. Q here represents the quantum.

Type Quantum Range AP_SAT_SYM Range
ap_ufixpt 2^(I_W - W)

0

to

2^(I_W) - Q

0

to

2^(I_W) - Q

ap_fixpt 2^(I_W - W)

-2^(I_W - 1)

to

2^(I_W - 1) - Q

-2^(I_W - 1) + Q

to

2^(I_W - 1) - Q

Some ap_[u]fixpt ranges are demonstrated in the following table.

Type Quantum Range
ap_fixpt<8, 4> 0.0625 -8 to 7.9375
ap_ufixpt<4, 12> 256 0 to 3840
ap_ufixpt<4, -2> 0.015625 0 to 0.234375

An example using ap_fixpt is show below.

#include "hls/ap_fixpt.hpp"
#include "hls/streaming.hpp"
#define TAPS 8

// A signed fixed point type with 10 integer bits and 6 fractional bits
// It employs convergent rounding for quantization, and saturation for overflow.
typedef hls::ap_fixpt<16, 10, hls::AP_RND_CONV, hls::AP_SAT> fixpt_t;

// A signed fixed point type with 3 integer bits and 1 fractional bit
// It uses the default truncation, and wrapping modes.
typedef hls::ap_fixpt<4, 3> fixpt_s_t;
void fir(hls::FIFO<fixpt_t> &input_fifo, hls::FIFO<fixpt_t> &output_fifo) {
#pragma HLS function top
#pragma HLS function pipeline
    fixpt_t in = input_fifo.read();

    static fixpt_t previous[TAPS] = {0};
    const fixpt_s_t coefficients[TAPS] = {-2, -1.5, -1, -0.5, 0.5, 1, 1.5, 2};

    for (unsigned i = (TAPS - 1); i > 0; --i) {
        previous[i] = previous[i - 1];
    }

    previous[0] = in;

    fixpt_t accumulate[TAPS];
    for (unsigned i = 0; i < TAPS; ++i) {
        accumulate[i] = previous[i] * coefficients[i];
    }

    // Accumulate results, doing adds and saturation in
    // a binary tree to reduce the number of serial saturation
    // checks. This significantly improves pipelining results
    // over serially adding results together when saturation
    // is required.
    for (unsigned i = TAPS >> 1; i > 0; i >>= 1) {
        for (unsigned j = 0; j < i; ++j) {
            accumulate[j] += accumulate[j + i];
        }
    }

    output_fifo.write(accumulate[0]);
}

This example implements a streaming FIR filter with 8 taps. Using the minimum width ap_fixpt to represent the constant coefficients allows the multiply to happen at a smaller width than if they were the same (wider) type as the inputs. This example ensures that no overflows occur by always assigning to an ap_fixpt that uses the AP_SAT overflow mode. This does incur a performance penalty, but this is minimized here by accumulating the results in a binary fashion, such that there are only log(TAPS) = 3 saturating operations that depend on each other. If the results were accumulated in a single variable in one loop then there would be TAPS = 8 saturating operations depending on each other. Having more saturating operations in a row is slower because at each step overflow needs to be checked before the next operation can occur.