Personal tools
You are here: Home Research Application and System Performance Compiler Technology for Extreme-scale Systems
Document Actions

Compiler Technology for Extreme-scale Systems

by admin last modified 2004-10-15 12:44
Today, MPI is the dominant programming model for writing scalable parallel programs. MPI has succeeded because it is ubiquitous and it makes it possible to program a wide range of commodity systems efficiently. However, as a programming model for extreme-scale systems, MPI has numerous shortcomings. For instance, when using MPI, the programmer must assume all responsibility for communication performance including choreographing asynchronous communication and overlapping it with computation. This complicates parallel programming significantly. Because of the explicit nature of MPI communication, significant compiler optimization of communication is impractical. Programming abstractions in which communication is not expressed in such a low-level form are better suited to having compiler optimization play a significant role in improving parallel performance. Also, when one uses MPI, only coarse grain communication is efficient; this has a profound impact on the way programs are structured. When an architecture supports a global name space and fine-grain low latency communication, other program organizations can be more efficient.

Global address space programming models are likely to emerge as the simplest to program and most efficient for emerging systems such as Cray’s Red Storm and future systems that arise out of DARPA’s HPCS project.  SPMD global address space programming models such as Co-array Fortran (CAF) and Unified Parallel C (UPC) offer promising near-term alternatives to MPI. Programming in these languages is simpler: one simply reads and writes shared variables. With communication and synchronization as part of the language, these languages are more amenable to compiler-directed communication optimization. This offers the potential for having compilers assist effectively in the development of high performance programs.  Research into compiler optimizations for SPMD programming languages offers the potential of not only simplifying parallel programming, but also yielding superior performance because compilers are suited for performing pervasive optimizations that application programmers would not consider employing manually because of their complexity. Also, because CAF and UPC are based on a shared-memory programming paradigm, they naturally lead to implementations that avoid copies where possible; this is important on modern computer systems because copies are costly.

However, global address space languages such as UPC and CAF are relatively immature, as is compiler technology to support them. Making these languages simple and efficient to use will require refining language primitives for efficiency and performance portability, developing new analysis and optimizations for SPMD programs, developing compiler support for tolerating latency and asynchrony, as well as developing supporting run-time mechanisms
Beyond explicitly parallel SPMD programming models, data-parallel models such as High Performance Fortran and Cray’s Chapel language offer an even simpler programming paradigm, but require more sophisticated compilation techniques to yield high performance. Research into compiler technology to increase the performance and scalability of data-parallel programming languages as well as broaden their applicability is important if parallel programs are to be significantly simpler to write in the future. For parallel programming models to succeed, their use and appeal must extend beyond just extreme-scale machines; therefore, sophisticated compiler technology is needed for these languages to make them perform well on today’s relatively loosely-coupled clusters as well as tightly-coupled petascale platforms of the future.

Higher-level data-parallel programming models such as HPF and Chapel pose significant challenges to compilers. Generating flexible high-performance code that runs effectively on a parameterized number of processors is a significant problem. We will continue to investigate analysis and code generation techniques with the aim of having compilers transform complex programs that use sophisticated algorithms into parallel programs that yield scalable high performance on a range of parallel systems.

« September 2010 »
Su Mo Tu We Th Fr Sa
1234
567891011
12131415161718
19202122232425
2627282930
 

Powered by Plone

LACSI Collaborators include:

Rice University LANL UH UNM UIUC UNC UTK