Resilience for Massively Parallel Multigrid Solvers

Beitrag in einer Fachzeitschrift
(Originalarbeit)


Details zur Publikation

Autor(en): Huber M, Gmeiner B, Rüde U, Wohlmuth BI
Zeitschrift: SIAM Journal on Scientific Computing
Verlag: Society for Industrial and Applied Mathematics
Jahr der Veröffentlichung: 2016
Band: 38
Heftnummer: 5
Seitenbereich: 217-239
ISSN: 1064-8275
Sprache: Englisch


Abstract


Fault tolerant massively parallel multigrid methods for elliptic partial differential equations are a step towards resilient solvers. Here, we combine domain partitioning with geometric multigrid methods to obtain fast and fault-robust solvers for three-dimensional problems. The recovery strategy is based on the redundant storage of ghost values, as they are commonly used in distributed memory parallel programs. In the case of a fault, the redundant interface values can be easily recovered, while the lost inner unknowns are recomputed approximately with recovery algorithms using multigrid cycles for solving a local Dirichlet problem. Different strategies are compared and evaluated with respect to performance, computational cost, and speedup. Especially effective are asynchronous strategies combining global solves with accelerated local recovery. By this, multiple faults can be fully compensated with respect to both the number of iterations and run-time. For illustration, we use a state-of-the-art petascale supercomputer to study failure scenarios when solving systems with up to 6 · 1011 (0.6 trillion) unknowns.



FAU-Autoren / FAU-Herausgeber

Gmeiner, Björn Dr.-Ing.
Lehrstuhl für Informatik 10 (Systemsimulation)
Huber, Markus
Lehrstuhl für Informatik 10 (Systemsimulation)
Rüde, Ulrich Prof. Dr.
Lehrstuhl für Informatik 10 (Systemsimulation)


Autor(en) der externen Einrichtung(en)
Technische Universität München (TUM)


Zitierweisen

APA:
Huber, M., Gmeiner, B., Rüde, U., & Wohlmuth, B.I. (2016). Resilience for Massively Parallel Multigrid Solvers. SIAM Journal on Scientific Computing, 38(5), 217-239. https://dx.doi.org/10.1137/15M1026122

MLA:
Huber, Markus, et al. "Resilience for Massively Parallel Multigrid Solvers." SIAM Journal on Scientific Computing 38.5 (2016): 217-239.

BibTeX: 

Zuletzt aktualisiert 2018-01-07 um 12:10