John Mellor-Crummey and John Garvin (2004)
Optimizing Sparse Matrix-Vector Product Computations Using Unroll And Jam
International Journal on High Performance Computing Applications, Volume 18(2):pp. 225-236(12).
Large-scale scientific applications frequently compute sparse matrix-vector products in their computational core. For this reason, techniques for computing sparse matrix-vector products efficiently on modern architectures are important. In this paper we describe a strategy for improving the performance of sparse matrix-vector product computations using a loop transformation known as unrollandjam. We describe a novel sparse matrix representation that enables us to apply this transformation. Our approach is best suited for sparse matrices that have rows with a small number of predictable lengths. This work was motivated by sparse matrices that arise in SAGE, an application from Los Alamos National Laboratory. We evaluate the performance benefits of our approach using sparse matrices produced by SAGE for a pair of sample inputs. We show that our strategy is effective for improving sparse matrix-vector product performance using these matrices on MIPS R12000, Alpha Ev67, IBM Power 3, and Itanium 2 processors. Our measurements show that for this class of sparse matrices, our strategy improves sparse matrix-vector product performance from a low of 41\% on MIPS to well over a factor of 2 on Itanium.