Applications and System Performance
Building scientific applications that can effectively exploit
extreme-scale parallel systems has proven incredibly difficult. The
sheer level of parallelism in such systems poses a formidable challenge
to achieving scalable performance. In addition, the architectural
complexity of extreme-scale systems makes it hard to write programs
that can fully exploit their capabilities. In today’s extreme-scale
systems, complex processors, deep memory hierarchies and heterogeneous
interconnects require careful scheduling of an application’s
operations, data accesses and communication to enable the application
to achieve a significant fraction of a system’s potential performance.
Furthermore, the large number of components in extreme-scale parallel
systems makes component failure inevitable; therefore, long-running
applications must be resilient to hardware faults or risk being unable
to run to completion.
The principal goals of the application and system performance research thrust are
As part of this research thrust, the project team will explore application performance on many fronts and undertake a program of research that aims to develop technologies to support measuring, modeling, understanding, tuning and steering application performance on current and future generations of extreme-scale parallel architectures. This work will address all aspects of performance and reliability spanning system architecture, network and applications. Our investigation will include work on both scalability and node performance. The findings from this research, as well as tools and software infrastructure developed as products of this effort, are expected to benefit all ASC application teams by providing them with more efficient programming models, technology for compiler-assisted tuning of applications, better performance instrumentation and diagnostic capabilities, insight into the performance and scaling of applications and systems through modeling, improved algorithm-architecture mapping, and better performing extreme-scale parallel architectures.
The principal goals of the application and system performance research thrust are
- understanding application and system performance on present-day extreme-scale architectures through the development and application of technologies for measurement and modeling of program and system behavior,
- devising software strategies to ameliorate application performance bottlenecks on today’s architectures,
- modeling the behavior of applications to understand factors affecting their scalability on future generations of extreme-scale systems, and
- investigating software technology that will enable higher performance on next-generation, extreme-scale parallel systems.
As part of this research thrust, the project team will explore application performance on many fronts and undertake a program of research that aims to develop technologies to support measuring, modeling, understanding, tuning and steering application performance on current and future generations of extreme-scale parallel architectures. This work will address all aspects of performance and reliability spanning system architecture, network and applications. Our investigation will include work on both scalability and node performance. The findings from this research, as well as tools and software infrastructure developed as products of this effort, are expected to benefit all ASC application teams by providing them with more efficient programming models, technology for compiler-assisted tuning of applications, better performance instrumentation and diagnostic capabilities, insight into the performance and scaling of applications and systems through modeling, improved algorithm-architecture mapping, and better performing extreme-scale parallel architectures.