Personal tools
You are here: Home Research Application and System Performance Compiler Technology
Document Actions

Compiler Technology

by admin last modified 2004-10-15 12:36

Compiler Technology for Exploiting Modern Processors

To keep pace with the Moore’s law curve and deliver 60% annual increases in processor performance, architects have increased the complexity of commodity processors and the memory systems that surround them. To produce code that achieves a significant fraction of peak performance on a modern commodity processor (e.g., Pentium, IA-64, Opteron, SPARC, or MIPS), a compiler must apply a complex series of transformations to the code (optimization) and then translate the result into the appropriate assembly code (code generation). To create code that executes efficiently, the compiler must address a number of challenging problems.
  1. The code must keep the functional units busy. The optimizer must transform the input program so that it has enough instruction-level parallelism to sustain the computation rate as well as an appropriate instruction mix. The code generator must discover a dense instruction schedule for the final code—it may need to use different scheduling algorithms for different points in the code, making the choice on a loop-by-loop or block-by-block basis.  
  2. The optimizer must transform the code so that its pattern of memory accesses matches those of the processor and memory system—adjusting locality with blocking, prefetching, and (perhaps) streaming. After the optimizer has rewritten the code so that it can move sufficient data onto the chip in a timely fashion, the code generator must manage instruction and data placement so that operands are kept in appropriate registers and, for clustered register-file machines, in the cluster where the operand is consumed.
  3. Finally, the optimizer and the code generator must work together to make effective use of processor features such as predicated execution, register windows, register stacks, auto-increment options, branch-delay slots, and hints to the hardware about locality and branch targets.
Research on this project is aimed at developing new techniques to address these problems—techniques suitable for implementation in either open source or commercial compilers, and at improving the quality of optimization and code generation available in both open source and commercial compilers for commodity processors used in high-performance computing.

« September 2010 »
Su Mo Tu We Th Fr Sa
1234
567891011
12131415161718
19202122232425
2627282930
 

Powered by Plone

LACSI Collaborators include:

Rice University LANL UH UNM UIUC UNC UTK