Tuning
Automatic Application Tuning
Increased complexity in both applications and architectures has created
an environment in which producing effective code is difficult. The
classic software production cycle, in which an application is compiled
once at a high-level of optimization, is no longer sufficient to
produce high-quality executable code. Systems that use run-time
adaptation, such as ATLAS, or that generate code tailored for specific
problem instances, such as UHFFT, demonstrate that adaptive strategies
can produce consistently good results. Building tools that incorporate
and automate such adaptation is a major challenge. The goal of this
project is to achieve results comparable to those of ATLAS or UHFFT
using automatic techniques—thereby making the benefits of such
adaptation available over a wider range of applications. (This goal
stands in contrast to the work in the Retargetable High-Performance Components and Libraries
Section, which aims to generate additional software libraries that
implement their own adaptive behavior. Success in this project
will complement success in that project.)
Adaptive Optimization Strategies: Modern compilers use a handful of strategies to improve each optimization. Typically, they apply the same strategies to all programs. For example, GCC supports three levels of optimization, –O1, -O2, and –O3, each representing a fixed strategy. Recent work has shown that program-specific strategies can produce consistently better code; several studies suggest that the improvements from program-specific strategies range up to 25% over any fixed strategy.
The problem with program-specific strategies lies in the cost of discovering them. One goal of this project is to develop cost-effective techniques to discover and apply program-specific optimization strategies. The work includes strategies for compiler configuration (e.g., both the set of optimizations to run and an order in which to apply them), for determining command-line parameter settings (e.g., GCC offers roughly fifty individual flags that can control different aspects of the individual passes), and for controlling the application of specific optimizations (e.g., loop blocking or inline substitution).
Performance-based Optimization Strategies: Recent work in performance analysis and modeling has enabled tools to identify performance bottlenecks in an application. Tools such as the HPCToolkit can use hardware performance counters to pinpoint both a region and a problem, as in “this inner loop has excessive L2 cache misses.” Classic optimizing compilers have no way to target such problems; in particular, techniques to ameliorate one region’s problem may exacerbate the problems of another region.
Performance-based optimizations will combine a regional approach to applying a particular transformation (as opposed to uniform application across an entire procedure) with a feedback-based steering mechanism that selects transformations and regions based on actual or predicted performance problems.
Research is needed into a spectrum of technologies to support effective whole program tuning. This research will include techniques to identify regions of inefficiency and to pinpoint symptoms of inefficiency (e.g. excessive TLB misses in a particular loop), strategies for coordinated application of integrated code transformations to ameliorate program bottlenecks, and search techniques for determining what the next step should be to tune the program based on the results of tuning attempts thus far.
Adaptive Optimization Strategies: Modern compilers use a handful of strategies to improve each optimization. Typically, they apply the same strategies to all programs. For example, GCC supports three levels of optimization, –O1, -O2, and –O3, each representing a fixed strategy. Recent work has shown that program-specific strategies can produce consistently better code; several studies suggest that the improvements from program-specific strategies range up to 25% over any fixed strategy.
The problem with program-specific strategies lies in the cost of discovering them. One goal of this project is to develop cost-effective techniques to discover and apply program-specific optimization strategies. The work includes strategies for compiler configuration (e.g., both the set of optimizations to run and an order in which to apply them), for determining command-line parameter settings (e.g., GCC offers roughly fifty individual flags that can control different aspects of the individual passes), and for controlling the application of specific optimizations (e.g., loop blocking or inline substitution).
Performance-based Optimization Strategies: Recent work in performance analysis and modeling has enabled tools to identify performance bottlenecks in an application. Tools such as the HPCToolkit can use hardware performance counters to pinpoint both a region and a problem, as in “this inner loop has excessive L2 cache misses.” Classic optimizing compilers have no way to target such problems; in particular, techniques to ameliorate one region’s problem may exacerbate the problems of another region.
Performance-based optimizations will combine a regional approach to applying a particular transformation (as opposed to uniform application across an entire procedure) with a feedback-based steering mechanism that selects transformations and regions based on actual or predicted performance problems.
Research is needed into a spectrum of technologies to support effective whole program tuning. This research will include techniques to identify regions of inefficiency and to pinpoint symptoms of inefficiency (e.g. excessive TLB misses in a particular loop), strategies for coordinated application of integrated code transformations to ameliorate program bottlenecks, and search techniques for determining what the next step should be to tune the program based on the results of tuning attempts thus far.