Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact

Afzal A, Hager G, Wellein G (2021)


Publication Status: Published

Publication Type: Conference contribution

Publication year: 2021

Publisher: Springer Science and Business Media Deutschland GmbH

Book Volume: 12728 LNCS

Pages Range: 351-371

Conference Proceedings Title: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Event location: Virtual, Online

ISBN: 9783030787127

DOI: 10.1007/978-3-030-78713-4_19

Abstract

Most distributed-memory bulk-synchronous parallel programs in HPC assume that compute resources are available continuously and homogeneously across the allocated set of compute nodes. However, long one-off delays on individual processes can cause global disturbances, so-called idle waves, by rippling through the system. This process is mainly governed by the communication topology of the underlying parallel code. This paper makes significant contributions to the understanding of idle wave dynamics. We study the propagation mechanisms of idle waves across the processes of MPI-parallel programs. We present a validated analytic model for their propagation velocity with respect to communication parameters and topology, with a special emphasis on sparse communication patterns. We study the interaction of idle waves with MPI collectives and show that, depending on the implementation, a collective may be permeable to the wave. Finally we analyze two mechanisms of idle wave decay: topological decay, which is rooted in differences in communication characteristics among parts of the system, and noise-induced decay, which is caused by system or application noise. We show that noise-induced decay is largely independent of noise characteristics but depends only on the overall noise power. An analytic expression for idle wave decay rate with respect to noise power is derived. For model validation we use microbenchmarks and stencil algorithms on three different supercomputing platforms.

Authors with CRIS profile

Additional Organisation(s)

How to cite

APA:

Afzal, A., Hager, G., & Wellein, G. (2021). Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact. In Bradford L. Chamberlain, Bradford L. Chamberlain, Ana-Lucia Varbanescu, Hatem Ltaief, Piotr Luszczek (Eds.), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 351-371). Virtual, Online: Springer Science and Business Media Deutschland GmbH.

MLA:

Afzal, Ayesha, Georg Hager, and Gerhard Wellein. "Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact." Proceedings of the 36th International Conference on High Performance Computing, ISC High Performance 2021, Virtual, Online Ed. Bradford L. Chamberlain, Bradford L. Chamberlain, Ana-Lucia Varbanescu, Hatem Ltaief, Piotr Luszczek, Springer Science and Business Media Deutschland GmbH, 2021. 351-371.

BibTeX: Download