Arithmetic With ap_[u]fixpt Types

(Ask a Question)

The Arbitrary Precision Fixed Point library supports all standard arithmetic, logical bitwise, shifts, and comparison operations. During arithmetic intermediate results are kept in a wide enough type to hold all of the possible resulting values. Operands are shifted to line up decimal points, and sign or zero-extended to match widths before an operation is performed. For fixed point arithmetic, whenever the result of a calculation can be negative the intermediate type is an ap_fixpt instead of ap_ufixpt regardless of whether any of the operands were ap_fixpt. Overflow and quantization handling only happen when the result is assigned to a fixed point type.

Important: Overflow and quantization handling is not performed for any assigning shifting operations (<<=, >>=) on ap_[u]fixpt types. Also, non-assigning shifts (<<, >>, .ashr(x)) do not change the width or type of the fixed point they are applied to. This means that bits can be shifted out of range.

Fixed point types can be mixed freely with other arbitrary precision and c++ numeric types for arithmetic, logical bitwise, and comparison operations, with some caveats for floating point types.

Important: For arithmetic and logical bitwise operations floating point types must be explicitly cast to an ap_[u]fixpt type before being used, because of the wide range of possible values the floating point type could represent. It is also a good idea, but not required, to use ap_[u]int types in place of C++ integers when less width is required.

Important: For convenience floating point types can be used directly in fixed point comparisons, however floating points are truncated and wrapped as if they were assigned to a signed ap_fixpt just big enough to hold all values of the ap_[u]fixpt type being compared against, with the AP_TRN and AP_WRAP modes on.

An example demonstrating some of this behaviour is show below.

#include "hls/ap_fixpt.hpp"
#include <iostream>
#include <stdio.h>

using namespace hls;

    //...
    ap_ufixpt<65, 14> a = 32.5714285713620483875274658203125;
    ap_ufixpt<15, 15> b = 7;
    ap_fixpt<8, 4> c = -3.125;

    // the resulting type is wide enough to hold all
    // 51 fractional bits of a, and 15 integer bits of b
    // the width, and integer width are increased by 1 to hold
    // all possible results of the addition
    ap_ufixpt<67, 16> d = a + b; // 39.5714285713620483875274658203125
    std::cout << "d = " << d << std::endl;
    // the resulting type is a signed fixed point
    // with width, and integer width that are the sum
    // of the two operands' widths
    ap_fixpt<23, 19> e = b * c; // -21.875
    std::cout << "e = " << e << std::endl;
    // Assignment triggers the AP_TRN_ZERO quantization mode
    ap_fixpt<8, 7, AP_TRN_ZERO> f = e; // -21.5
    std::cout << "f = " << f << std::endl;
    // Mask out bits above the decimal
    f &= 0xFF; // -22
    std::cout << "f = " << f << std::endl;
    // Assignment triggers the AP_SAT overflow mode,
    // and saturates the negative result to 0
    ap_ufixpt<8, 4, AP_TRN, AP_SAT> g = b * d; // 0
    std::cout << "g = " << g << std::endl;

v2024.2

Arithmetic With ap_[u]fixpt Types