12.14 Auto-Vectorization to SIMD

The compiler supports auto-vectorization for loops at optimization level -O3. The advantage of auto-vectorization is that the compiler can recognize scalar variables (which can be integer, fixed-point, or floating-point types) in order to utilize SIMD (Single Instruction, Multiple Data) instructions automatically. In the ideal case, when auto-vectorization is used, there is no need to use SIMD variables explicitly.

Example:

  /* add8.c */
  unsigned char a[32], b[32], c[32];
  void add8() {
    int i;
    for (i = 0; i < 32; i++)
    {
      c[i] = a[i] + b[i];
    }
  }
# Illustrative generated assembly code
add8:
  lui     v0,0x0
  addiu   v0,v0,0
  lui     a0,0x0
  addiu   a0,a0,0
  lui     v1,0x0
  addiu   v1,v1,0
  addiu   a3,v0,32
  lw      a2,0(a0)
  lw      a1,0(v0)
  addiu   v0,v0,4
  addiu   a0,a0,4
  addu.qb a1,a2,a1
  addiu   v1,v1,4
  bne     v0,a3,1c <add8+0x1c>
  sw      a1,-4(v1)
  jr      ra

In add8.c, elements in two arrays of unsigned char are added together. The compiler automatically generates the code for addu.qb to add four elements at a time.

For existing C code, try auto-vectorization at the -O3 optimization level without any modifications to see if the compiler can auto vectorize the loops. In some cases, if the loop body is too complex, the compiler will not be able to auto-vectorize the loop; in this case, you may choose to restructure and simplify the loop.