12.14 Auto-Vectorization to SIMD
The compiler supports auto-vectorization for loops at optimization level -O3. The advantage of auto-vectorization is that the compiler can recognize scalar variables (which can be integer, fixed-point, or floating-point types) in order to utilize SIMD (Single Instruction, Multiple Data) instructions automatically. In the ideal case, when auto-vectorization is used, there is no need to use SIMD variables explicitly.
Example:
/* add8.c */
unsigned char a[32], b[32], c[32];
void add8() {
int i;
for (i = 0; i < 32; i++)
{
c[i] = a[i] + b[i];
}
}
# Illustrative generated assembly code
add8:
lui v0,0x0
addiu v0,v0,0
lui a0,0x0
addiu a0,a0,0
lui v1,0x0
addiu v1,v1,0
addiu a3,v0,32
lw a2,0(a0)
lw a1,0(v0)
addiu v0,v0,4
addiu a0,a0,4
addu.qb a1,a2,a1
addiu v1,v1,4
bne v0,a3,1c <add8+0x1c>
sw a1,-4(v1)
jr ra
In add8.c
, elements in two arrays of unsigned char
are
added together. The compiler automatically generates the code for
addu.qb
to add four elements at a time.
For existing C code, try auto-vectorization at the -O3 optimization level without any modifications to see if the compiler can auto vectorize the loops. In some cases, if the loop body is too complex, the compiler will not be able to auto-vectorize the loop; in this case, you may choose to restructure and simplify the loop.