Compiler Technology
Compiler Technology for Exploiting Modern Processors
To keep pace with the Moore’s law curve and deliver 60% annual
increases in processor performance, architects have increased the
complexity of commodity processors and the memory systems that surround
them. To produce code that achieves a significant fraction of peak
performance on a modern commodity processor (e.g., Pentium, IA-64,
Opteron, SPARC, or MIPS), a compiler must apply a complex series of
transformations to the code (optimization) and then translate the
result into the appropriate assembly code (code generation). To create
code that executes efficiently, the compiler must address a number of
challenging problems.
- The code must keep the functional units busy. The optimizer must transform the input program so that it has enough instruction-level parallelism to sustain the computation rate as well as an appropriate instruction mix. The code generator must discover a dense instruction schedule for the final code—it may need to use different scheduling algorithms for different points in the code, making the choice on a loop-by-loop or block-by-block basis.
- The optimizer must transform the code so that its pattern of memory accesses matches those of the processor and memory system—adjusting locality with blocking, prefetching, and (perhaps) streaming. After the optimizer has rewritten the code so that it can move sufficient data onto the chip in a timely fashion, the code generator must manage instruction and data placement so that operands are kept in appropriate registers and, for clustered register-file machines, in the cluster where the operand is consumed.
- Finally, the optimizer and the code generator must work together to make effective use of processor features such as predicated execution, register windows, register stacks, auto-increment options, branch-delay slots, and hints to the hardware about locality and branch targets.