Afzal A, Hager G, Wellein G, Markidis S (2023)
Publication Language: English
Publication Type: Conference contribution
Publication year: 2023
Publisher: Springer, Cham
Book Volume: 13826
Conference Proceedings Title: Lecture Notes in Computer Science
Event location: Gdansk, Poland
DOI: 10.1007/978-3-031-30442-2_12
Open Access Link: https://link.springer.com/chapter/10.1007/978-3-031-30442-2_12
This paper studies the utility of using data analytics and machine learning techniques for identifying, classifying, and characterizing the dynamics of large-scale parallel (MPI) programs. To this end, we run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms and choose the per-process performance and MPI time per time step as relevant observables. Using principal component analysis, clustering techniques, correlation functions, and a new “phase space plot,” we show how desynchronization patterns (or lack thereof) can be readily identified from a data set that is much smaller than a full MPI trace. Our methods also lead the way towards a more general classification of parallel program dynamics.
APA:
Afzal, A., Hager, G., Wellein, G., & Markidis, S. (2023). Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications. In Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (Eds.), Lecture Notes in Computer Science. Gdansk, Poland: Springer, Cham.
MLA:
Afzal, Ayesha, et al. "Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications." Proceedings of the 14th International Conference on Parallel Processing and Applied Mathematics, PPAM 2022, Gdansk, Poland Ed. Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K., Springer, Cham, 2023.
BibTeX: Download