Wolfson-Pou J, Laukemann J, Petrini F (2025)
Publication Type: Conference contribution
Publication year: 2025
Pages Range: 442-457
Conference Proceedings Title: ICS '25: Proceedings of the 39th ACM International Conference on Supercomputing
Event location: Salt Lake City
Information & Contributors Bibliometrics & Citations View Options References48 Figures Tables Share Sparse general matrix-matrix multiplication (SpGEMM) is a critical operation in many applications. Current multithreaded implementations are based on Gustavson’s algorithm and often perform poorly on large matrices due to limited cache reuse by the accumulators. We present MAGNUS (Matrix Algebra for Gigantic NUmerical Systems), a novel algorithm to maximize data locality in SpGEMM. To generate locality, MAGNUS reorders the intermediate product into discrete cache-friendly chunks using a two-level hierarchical approach. The accumulator is applied to each chunk, where the chunk size is chosen such that the accumulator is cache-efficient. MAGNUS is input- and system-aware: based on the matrix characteristics and target system specifications, the optimal number of chunks is computed by minimizing the storage cost of the necessary data structures. MAGNUS allows for a hybrid accumulation strategy in which each chunk uses a different accumulator based on an input threshold. We consider two accumulators: an AVX-512 vectorized bitonic sorting algorithm and classical dense accumulation. An OpenMP implementation of MAGNUS is compared with several baselines, including Intel MKL, for a variety of different matrices on three Intel architectures. For matrices from the SuiteSparse collection, MAGNUS is faster than all the baselines in most cases and is often an order of magnitude faster than at least one baseline. For massive random matrices, MAGNUS scales to the largest matrix sizes, while the baselines do not. Furthermore, MAGNUS is close to the optimal bound for these matrices, regardless of the matrix size, structure, and density.
APA:
Wolfson-Pou, J., Laukemann, J., & Petrini, F. (2025). MAGNUS: Generating Data Locality to Accelerate Sparse Matrix-Matrix Multiplication on CPUs. In ICS '25: Proceedings of the 39th ACM International Conference on Supercomputing (pp. 442-457). Salt Lake City.
MLA:
Wolfson-Pou, Jordi, Jan Laukemann, and Fabrizio Petrini. "MAGNUS: Generating Data Locality to Accelerate Sparse Matrix-Matrix Multiplication on CPUs." Proceedings of the ICS '25: 2025 International Conference on Supercomputing Salt Lake City USA, Salt Lake City 2025. 442-457.
BibTeX: Download