Klöffel T, Mathias G, Meyer B (2021)
Publication Type: Journal article
Publication year: 2021
Book Volume: 260
Article Number: 107745
DOI: 10.1016/j.cpc.2020.107745
The development in today's supercomputer hardware is that the compute power of the individual nodes grows much faster than the speed of their interconnects. To benefit from this evolution in computer hardware, the challenge in modernization of simulation software is to increase the computational load on the nodes and to reduce simultaneously the inter-node communication. Here, we demonstrate the implementation of such a strategy for plane-wave based electronic structure methods and ab initio molecular dynamics (AIMD) simulations. Our focus is on ultra-soft pseudopotentials (USPP), since they allow to shift workload from fast Fourier transforms (FFTs) to highly node-efficient matrix–matrix multiplications. For communication intensive routines, as the multiple distributed 3-d FFTs of the electronic states and the distributed matrix–matrix multiplications related to the β-projectors of the pseudopotentials, parallel MPI+OpenMP algorithms are revised to make use of overlapping computation and communication. The necessary partitioning of the workload is optimized by auto-tuning algorithms. In addition, the largest global MPI_Allreduce operation is replaced by highly tuned node-local parallelized operations using MPI shared-memory windows to avoid inter-node communication. A batched algorithm for the multiple 3-d FFTs improves the throughput of the MPI_Alltoall communication and, thus, the scalability of the implementation, both for USPP and for frequently used norm-conserving pseudopotentials. The new algorithms have been implemented in the AIMD program CPMD (www.cpmd.org). The enhanced performance and scalability of the code is demonstrated on simulations of liquid water with up to 2048 molecules. It is shown that 100ps simulations with many hundred water molecules can be done now routinely within a few days on a moderate number of nodes.
APA:
Klöffel, T., Mathias, G., & Meyer, B. (2021). Integrating state of the art compute, communication, and autotuning strategies to multiply the performance of ab initio molecular dynamics on massively parallel multi-core supercomputers. Computer Physics Communications, 260. https://doi.org/10.1016/j.cpc.2020.107745
MLA:
Klöffel, Tobias, Gerald Mathias, and Bernd Meyer. "Integrating state of the art compute, communication, and autotuning strategies to multiply the performance of ab initio molecular dynamics on massively parallel multi-core supercomputers." Computer Physics Communications 260 (2021).
BibTeX: Download