7.6.6.1 Options For Specific Optimization Control

Table 7-12. Specific Optimization Options
Option	Definition
`-falign-functions` `-falign-functions=n`	Align the start of functions to the next power-of-two greater than `n`, skipping up to n bytes. For instance, `-falign-functions=32` aligns functions to the next 32-byte boundary, but `-falign-functions=24` would align to the next 32-byte boundary only if this can be done by skipping 23 bytes or less. `-fno-align-functions` and `-falign-functions=1` are equivalent and mean that functions will not be aligned. The assembler only supports this flag when `n` is a power of two; so `n` is rounded up. If `n` is not specified, use a machine-dependent default.
`-falign-labels` `-falign-labels=n`	Align all branch targets to a power-of-two boundary, skipping up to `n` bytes like `-falign-functions`. This option can easily make code slower, because it must insert dummy operations for when the branch target is reached in the usual flow of the code. If `-falign-loops` or `-falign-jumps` are applicable and are greater than this value, then their values are used instead. If `n` is not specified, use a machine-dependent default which is very likely to be 1, meaning no alignment.
`-falign-loops` `-falign-loops=n`	Align loops to a power-of-two boundary, skipping up to `n` bytes like `-falign-functions`. The hope is that the loop will be executed many times, which will make up for any execution of the dummy operations. If `n` is not specified, use a machine-dependent default.
`-fcaller-saves`	Enable values to be allocated in registers that will be clobbered by function calls, by emitting extra instructions to save and restore the registers around such calls. Such allocation is done only when it seems to result in better code than would otherwise be produced.
`-fcse-follow-jumps`	In common subexpression elimination, scan through jump instructions when the target of the jump is not reached by any other path. For example, when CSE encounters an `if` statement with an `else` clause, CSE will follow the jump when the condition tested is false.
`-fcse-skip-blocks`	This is similar to `-fcse-follow-jumps`, but causes CSE to follow jumps which conditionally skip over blocks. When CSE encounters a simple `if` statement with no `else` clause, `-fcse-skip-blocks` causes CSE to follow the jump around the body of the `if`.
`-fexpensive- optimizations`	Perform a number of minor optimizations that are relatively expensive.
`-ffunction-sections` `-fdata-sections`	Place each function or data item into its own section in the output file. The name of the function or the name of the data item determines the section’s name in the output file. Only use these options when there are significant benefits for doing so. When you specify these options, the assembler and linker may create larger object and executable files and will also be slower. See also The `-ffunction-sections` Option.
`-fgcse`	Perform a global common subexpression elimination pass. This pass also performs global constant and copy propagation.
`-fgcse-lm`	When `-fgcse-lm` is enabled, global common subexpression elimination will attempt to move loads which are only killed by stores into themselves. This allows a loop containing a load/store sequence to be changed to a load outside the loop, and a copy/store within the loop.
`-fgcse-sm`	When `-fgcse-sm` is enabled, a store motion pass is run after global common subexpression elimination. This pass will attempt to move stores out of loops. When used in conjunction with `-fgcse-lm`, loops containing a load/store sequence can be changed to a load before the loop and a store after the loop.
`-fno-defer-pop`	Always pop the arguments to each function call as soon as that function returns. The compiler normally lets arguments accumulate on the stack for several function calls and pops them all at once.
`-fno-peephole` `-fno-peephole2`	Disable machine specific peephole optimizations. Peephole optimizations occur at various points during the compilation. `-fno-peephole` disables peephole optimization on machine instructions, while `-fno-peephole2` disables high level peephole optimizations. To disable peephole entirely, use both options.
`-foptimize- register-move` `-fregmove`	Attempt to reassign register numbers in move instructions and as operands of other simple instructions in order to maximize the amount of register tying. `-fregmove` and `-foptimize-register-moves` are the same optimization.
`-frename-registers`	Attempt to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization will most benefit processors with lots of registers. It can, however, make debugging impossible, since variables will no longer stay in a “home register”.
`-frerun-cse-after- loop`	Rerun common subexpression elimination after loop optimizations has been performed.
`-frerun-loop-opt`	Run the loop optimizer twice.
`-fschedule-insns`	Attempt to reorder instructions to eliminate Read-After-Write stalls (see your device Family Reference Manual (FRM) for more details). Typically improves performance with no impact on code size.
`-fschedule-insns2`	Similar to `-fschedule-insns`, but requests an additional pass of instruction scheduling after register allocation has been done.
`-fstrength-reduce`	Perform the optimizations of loop strength reduction and elimination of iteration variables.
`-fstrict-aliasing`	Allows the compiler to assume the strictest aliasing rules applicable to the language being compiled. For C, this  activates optimizations based on the type of expressions. In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same. For example, an `unsigned int` can alias an `int`, but not a `void` or a `double`. A character type may alias any other type. Pay special attention to code like this: `union a_union { int i; double d; }; int f() { union a_union t; t.d = 3.0; return t.i; }` The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with `-fstrict-aliasing`, type-punning is allowed, provided the memory is accessed through the union type. So the code above will work as expected, but the following code might not: `int f() { a_union t; int ip; t.d = 3.0; ip = &t.i; return *ip; }`
`-fthread-jumps`	Perform optimizations where a check is made to see if a jump branches to a location where another comparison subsumed by the first is found. If so, the first branch is redirected to either the destination of the second branch or a point immediately following it, depending on whether the condition is known to be true or false.
`-funroll-loops`	Perform the optimization of loop unrolling. This is only done for loops whose number of iterations can be determined at compile time or run time. `-funroll-loops` implies both `-fstrength-reduce` and `-frerun-cse-after-loop`.
`-funroll-all-loops`	Perform the optimization of loop unrolling. This is done for all loops and usually makes programs run more slowly. `-funroll-all-loops` implies `-fstrength-reduce`, as well as `-frerun-cse-after-loop`.

-falign-functions

-falign-functions=n

Align the start of functions to the next power-of-two greater than n, skipping up to n bytes. For instance, -falign-functions=32 aligns functions to the next 32-byte boundary, but -falign-functions=24 would align to the next 32-byte boundary only if this can be done by skipping 23 bytes or less.

-fno-align-functions and -falign-functions=1 are equivalent and mean that functions will not be aligned.

The assembler only supports this flag when n is a power of two; so n is rounded up. If n is not specified, use a machine-dependent default.

-falign-labels

-falign-labels=n

Align all branch targets to a power-of-two boundary, skipping up to n bytes like -falign-functions. This option can easily make code slower, because it must insert dummy operations for when the branch target is reached in the usual flow of the code.

If -falign-loops or -falign-jumps are applicable and are greater than this value, then their values are used instead.

If n is not specified, use a machine-dependent default which is very likely to be 1, meaning no alignment.

-falign-loops

-falign-loops=n

Align loops to a power-of-two boundary, skipping up to n bytes like -falign-functions. The hope is that the loop will be executed many times, which will make up for any execution of the dummy operations.

If n is not specified, use a machine-dependent default.

-fcaller-saves

Enable values to be allocated in registers that will be clobbered by function calls, by emitting extra instructions to save and restore the registers around such calls. Such allocation is done only when it seems to result in better code than would otherwise be produced.

-fcse-follow-jumps

In common subexpression elimination, scan through jump instructions when the target of the jump is not reached by any other path. For example, when CSE encounters an if statement with an else clause, CSE will follow the jump when the condition tested is false.

-fcse-skip-blocks

This is similar to -fcse-follow-jumps, but causes CSE to follow jumps which conditionally skip over blocks. When CSE encounters a simple if statement with no else clause, -fcse-skip-blocks causes CSE to follow the jump around the body of the if.

-fexpensive- optimizations

Perform a number of minor optimizations that are relatively expensive.

-ffunction-sections

-fdata-sections

Place each function or data item into its own section in the output file. The name of the function or the name of the data item determines the section’s name in the output file.

Only use these options when there are significant benefits for doing so. When you specify these options, the assembler and linker may create larger object and executable files and will also be slower.

-fgcse

Perform a global common subexpression elimination pass. This pass also performs global constant and copy propagation.

-fgcse-lm

When -fgcse-lm is enabled, global common subexpression elimination will attempt to move loads which are only killed by stores into themselves. This allows a loop containing a load/store sequence to be changed to a load outside the loop, and a copy/store within the loop.

-fgcse-sm

When -fgcse-sm is enabled, a store motion pass is run after global common subexpression elimination. This pass will attempt to move stores out of loops. When used in conjunction with -fgcse-lm, loops containing a load/store sequence can be changed to a load before the loop and a store after the loop.

-fno-defer-pop

Always pop the arguments to each function call as soon as that function returns. The compiler normally lets arguments accumulate on the stack for several function calls and pops them all at once.

-fno-peephole

-fno-peephole2

Disable machine specific peephole optimizations. Peephole optimizations occur at various points during the compilation. -fno-peephole disables peephole optimization on machine instructions, while -fno-peephole2 disables high level peephole optimizations. To disable peephole entirely, use both options.

-foptimize- register-move

-fregmove

Attempt to reassign register numbers in move instructions and as operands of other simple instructions in order to maximize the amount of register tying.

-fregmove and -foptimize-register-moves are the same optimization.

-frename-registers

Attempt to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization will most benefit processors with lots of registers. It can, however, make debugging impossible, since variables will no longer stay in a “home register”.

-frerun-cse-after- loop

Rerun common subexpression elimination after loop optimizations has been performed.

-frerun-loop-opt

Run the loop optimizer twice.

-fschedule-insns

Attempt to reorder instructions to eliminate Read-After-Write stalls (see your device Family Reference Manual (FRM) for more details). Typically improves performance with no impact on code size.

-fschedule-insns2

Similar to -fschedule-insns, but requests an additional pass of instruction scheduling after register allocation has been done.

-fstrength-reduce

Perform the optimizations of loop strength reduction and elimination of iteration variables.

-fstrict-aliasing

Allows the compiler to assume the strictest aliasing rules applicable to the language being compiled. For C, this  activates optimizations based on the type of expressions. In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same. For example, an unsigned int can alias an int, but not a void* or a double. A character type may alias any other type.

Pay special attention to code like this:

union a_union { 
  int i;
  double d;
};

int f() {
  union a_union t;
  t.d = 3.0;
  return t.i;
}

The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So the code above will work as expected, but the following code might not:

int f() {
  a_union t;
  int* ip;
  t.d = 3.0;
  ip = &t.i;
  return *ip; 
}

-fthread-jumps

Perform optimizations where a check is made to see if a jump branches to a location where another comparison subsumed by the first is found. If so, the first branch is redirected to either the destination of the second branch or a point immediately following it, depending on whether the condition is known to be true or false.

-funroll-loops

Perform the optimization of loop unrolling. This is only done for loops whose number of iterations can be determined at compile time or run time. -funroll-loops implies both -fstrength-reduce and -frerun-cse-after-loop.

-funroll-all-loops

Perform the optimization of loop unrolling. This is done for all loops and usually makes programs run more slowly. -funroll-all-loops implies -fstrength-reduce, as well as -frerun-cse-after-loop.