John Mellor-Crummey and John Garvin (2002)
Optimizing Sparse Matrix Vector Product Computations Using Unroll and Jam
In: Proceedings of the LACSI Symposium, Los Alamos, NM, Los Alamos Computer Science Institute.
Large-scale scientific applications frequently compute sparse matrix vector products in their computational core. For this reason, techniques for computing sparse matrix vector products efficiently on modern architectures are important. This paper describes a strategy for improving the performance of sparse matrix vector product computations using a loop transformation known as unroll-and-jam. We describe a novel sparse matrix representation that enables us to apply this transformation. Our approach is best suited for sparse matrices that have rows with a small number of predictable lengths. This work was motivated by sparse matrices that arise in SAGE, an ASCI application from Los Alamos National Laboratory. We evaluate the performance benefits of our approach using sparse matrices produced by SAGE for a pair of sample inputs. We showthat our strategy is effective for improving sparse matrix vector product performance using these matrices on MIPS R12000, Alpha Ev67, IBM Power 3 and Itanium processors. Our measurements showthat for this class of sparse matrices, our strategy improves sparse matrix vector product performance from a low of 11\% on MIPS to well over a factor of two on Itanium.