Guohua Jin and John Mellor-Crummey (2005)
Improving Performance by Reducing the Memory Footprint of Scientific Applications
International Journal of High Performance Computing Applications, 19(4):433-451.
Over the last two decades, processor speeds have improved much faster than memory speeds. As a result, memory access delay is a major performance bottleneck in today's systems. Compilers often fail to choreograph data and computation automatically to avoid memory access delay; we have developed an annotation-driven source-to-source transformation tool for this purpose. This tool uses a set of compiler transformations that improve temporal reuse in scientific applications (1) by reducing the size of temporary arrays and (2) by overlaying storage for multiple temporary arrays that are not live at the same time. We also describe two supporting transformations, statement motion and loop alignment, that improve the effectiveness of storage reduction. Our experiments with a numerical kernel and two weather codes show that our storage reduction optimizations amplify the benefits of loop transformations and double performance achievable with loop transformations alone.