Laukemann J, Hager G, Wellein G (2025)
Publication Type: Conference contribution
Publication year: 2025
Publisher: IEEE
City/Town: New York City
Pages Range: 1405-1412
Conference Proceedings Title: SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
DOI: 10.1109/SCW63240.2024.00181
With Nvidia’s release of the Grace Superchip, all three big semiconductor companies in HPC (AMD, Intel, Nvidia) are currently competing in the race for the best CPU. In this work we analyze the performance of these state-of-the-art CPUs and create an accurate in-core performance model for their microarchitectures Zen 4, Golden Cove, and Neoverse V2, extending the Open Source Architecture Code Analyzer (OSACA) tool and comparing it with LLVM-MCA. Starting from the peculiarities and up- and downsides of a single core, we extend our comparison by a variety of microbenchmarks and the capabilities of a full node. The "write-allocate (WA) evasion" feature, which can automatically reduce the memory traffic caused by write misses, receives special attention; we show that the Grace Superchip has a next-to-optimal implementation of WA evasion, and that the only way to avoid write allocates on Zen 4 is the explicit use of non-temporal stores.
APA:
Laukemann, J., Hager, G., & Wellein, G. (2024). Microarchitectural comparison and in-core modeling of state-of-the-art CPUs: Grace, Sapphire Rapids, and Genoa. In SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1405-1412). Atlanta, US: New York City: IEEE.
MLA:
Laukemann, Jan, Georg Hager, and Gerhard Wellein. "Microarchitectural comparison and in-core modeling of state-of-the-art CPUs: Grace, Sapphire Rapids, and Genoa." Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta New York City: IEEE, 2024. 1405-1412.
BibTeX: Download