5.5.3 Multiplication

The PIC18 instruction set includes several 8-bit by 8-bit hardware multiple instructions, with these being used by the compiler in many situations. Non-PIC18 targets always use a library routine for multiplication operations.

There are three ways that 8x8-bit integer multiplication can be implemented by the compiler:

Hardware Multiply Instructions (HMI)
These assembly instructions are the most efficient method of multiplication, but they are only available on PIC18 devices.
A bitwise iteration (8loop)
Where dedicated multiplication instructions are not available, this implementation produces the smallest amount of code – a loop cycles through the bit pattern in the operands and constructs the result bit-by-bit.

The speed of this implementation varies and is dependent on the operand values; however, this is typically the slowest method of performing multiplication.

An unrolled 
bitwise sequence (8seq)
This implementation performs a sequence of instructions that is identical to the bitwise iteration (above), but the loop is unrolled.

The generated code is larger, but execution is faster than the loop version.

Multiplication of operands larger than 8 bits can be performed one of the following two ways:

A bitwise iteration (xloop)
This is the same algorithm used by 8-bit multiplication (above) but the loop runs over all (x) bits of the operands.

Like its 8-bit counterpart, this implementation produces the smallest amount of code but is typically the slowest method of performing multiplication.

A bytewise decomposition (bytdec)
This is a decomposition of the multiplication into a summation of many 8-bit multiplications. The 8-bit multiplications can then be performed using any of the methods described above.

This decomposition is advantageous for PIC18 devices, which can then use hardware multiply instructions.

For other devices, this method is still fast, but the code size can become impractical.

Multiplication of floating-point operands operates in a similar way – the integer mantissas can be multiplied using either a bitwise loop (xfploop) or by a bytewise decomposition.

The following tables indicate which of the multiplication methods are chosen by the compiler when performing multiplication of both integer and floating point operands. The method is dependent on the size of the operands, the type of optimizations enabled and the target device.

The table below shows the methods chosen when speed optimizations are enabled (see 4.6.6 Options for Controlling Optimization).

Table 5-8. Multiplication With Speed Optimizations
Device 8-bit 16-bit 24-bit 32-bit 24-bit FP 32-bit FP
PIC18 HMI bytdec+HMI bytdec+HMI bytdec+HMI bytdec+HMI bytdec+HMI
Enhanced Mid-range 8seq bytdec+8seq bytdec+8seq bytdec+8seq bytdec+8seq bytdec+8seq
Mid-range/
Baseline 8seq 16loop 24loop 32loop 24fploop 32fploop

The table below shows the method chosen when space optimizations are enabled or when no C-level optimizations are enabled.

Table 5-9. Multiplication With No Or Space Optimizations
Device 8-bit 16-bit 24-bit 32-bit 24-bit FP 32-bit FP
PIC18 HMI bytdec+HMI 24loop 32loop 24fploop 32fploop
Enhanced Mid-range 8loop bytdec+8loop 24loop 32loop 24fploop 32fploop
Mid-range/Baseline 8loop 16loop 24loop 32loop 24fploop 32fploop

The source code for the multiplication routines (documented with the algorithms employed) is available in the pic/c99/sources directory, located in the compiler’s installation directory. Look for files whose name has the form Umulx.c. where x is the size of the operation in bits.

If your device and optimization settings dictate the use of a bitwise multiplication loop you can sometimes arrange the multiplication operands in your C code to improve the operation’s speed. Where possible, ensure that the left operand to the multiplication is the smallest of the operands.

For example, in the code:

x = 10;
y = 200;
result = x * y;  // first multiply
result = y * x;  // second multiply

the variable result will be assigned the same value in both statements, but the first multiplication expression will be performed faster than the second.