Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study

Afzal A, Hager G, Wellein G (2019)


Publication Type: Conference contribution

Publication year: 2019

Publisher: Institute of Electrical and Electronics Engineers Inc.

Book Volume: 2019-September

Conference Proceedings Title: Proceedings - IEEE International Conference on Cluster Computing, ICCC

Event location: Albuquerque, NM US

ISBN: 9781728147345

DOI: 10.1109/CLUSTER.2019.8890995

Abstract

Analytic, first-principles performance modeling of distributed-memory applications is difficult due to a wide spectrum of random disturbances caused by the application and the system. These disturbances (commonly called 'noise') run contrary to the assumptions about regularity that one usually employs when constructing simple analytic models. Despite numerous efforts to quantify, categorize, and reduce such effects, a comprehensive quantitative understanding of their performance impact is not available, especially for long, one-off delays of execution periods that have global consequences for the parallel application. In this work, we investigate various traces collected from synthetic benchmarks that mimic real applications on simulated and real message-passing systems in order to pin-point the mechanisms behind delay propagation. We analyze the dependence of the propagation speed of 'idle waves, ' i.e., propagating phases of inactivity, emanating from injected delays with respect to the execution and communication properties of the application, study how such delays decay under increased noise levels, and how they interact with each other. We also show how fine-grained noise can make a system immune against the adverse effects of propagating idle waves. Our results contribute to a better understanding of the collective phenomena that manifest themselves in distributed-memory parallel applications.

Authors with CRIS profile

How to cite

APA:

Afzal, A., Hager, G., & Wellein, G. (2019). Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study. In Proceedings - IEEE International Conference on Cluster Computing, ICCC. Albuquerque, NM, US: Institute of Electrical and Electronics Engineers Inc..

MLA:

Afzal, Ayesha, Georg Hager, and Gerhard Wellein. "Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study." Proceedings of the 2019 IEEE International Conference on Cluster Computing, CLUSTER 2019, Albuquerque, NM Institute of Electrical and Electronics Engineers Inc., 2019.

BibTeX: Download