Laukemann J, Hager G, Wellein G (2026)
Publication Type: Journal article
Publication year: 2026
Book Volume: 127
Article Number: 103183
DOI: 10.1016/j.parco.2026.103183
Three big semiconductor companies in HPC are currently competing in the race for the best CPU: AMD, Intel, and NVIDIA. There are significant differences among their state-of-the-art CPU designs, spanning the entire range from instruction execution to cache behavior and main memory bandwidth. In this work, we analyze the performance of CPUs based on the Zen 4, Golden Cove, and Neoverse V2 microarchitectures. We create accurate in-core performance models for use with the Open Source Architecture Code Analyzer (OSACA) tool and compare its prediction accuracy with llvm-mca. Beyond the tool aspect, this reveals interesting differences in in-core design points but also some commonalities. Beyond the single core, we extend our comparison by measuring data-transfer behavior through the memory hierarchy using a variety of microbenchmarks. We thoroughly investigate the “write-allocate (WA) evasion” feature, which can automatically reduce the memory traffic caused by write misses. We show that the Grace Superchip has a next-to-optimal implementation of WA evasion while the Sapphire Rapids CPU can avoid write allocates completely only in specific scenarios. The only way to eliminate WAs on AMD Genoa is the explicit use of non-temporal stores. Finally, we study the cache hierarchy of the CPUs in view of the Execution-Cache-Memory (ECM) performance model, revealing overlapping cache hierarchies on Genoa and Grace in contrast to Sapphire Rapids.
APA:
Laukemann, J., Hager, G., & Wellein, G. (2026). Microarchitectural comparison, in-core modeling, and memory hierarchy analysis of state-of-the-art CPUs: Grace, Sapphire Rapids, and Genoa. Parallel Computing, 127. https://doi.org/10.1016/j.parco.2026.103183
MLA:
Laukemann, Jan, Georg Hager, and Gerhard Wellein. "Microarchitectural comparison, in-core modeling, and memory hierarchy analysis of state-of-the-art CPUs: Grace, Sapphire Rapids, and Genoa." Parallel Computing 127 (2026).
BibTeX: Download