Asynchronous Checkpointing by Dedicated Checkpoint Threads

Beitrag in einem Sammelwerk


Details zur Publikation

Autorinnen und Autoren: Shahzad F, Wittmann M, Zeiser T, Wellein G
Titel Sammelwerk: Recent Advances in the Message Passing Interface
Verlag: Springer-verlag
Verlagsort: -
Jahr der Veröffentlichung: 2012
Titel der Reihe: Lecture Notes in Computer Science
Band: 7490
Seitenbereich: 289-290
ISBN: 978-3-642-33517-4
ISSN: 0302-9743
Sprache: Englisch


Abstract


Checkpoint/restart (C/R) is a classical approach to introduce fault tolerance in large HPC applications. Although it is relatively easy as compared to other fault tolerance approaches, its overhead hinders its wide usage. We present an application-level checkpointing technique that significantly reduces the checkpoint overhead. The checkpoint I/O is overlapped with the computation of the application by following a two-stage checkpointing mechanism with dedicated threads for doing I/O. © 2012 Springer-Verlag.



FAU-Autorinnen und Autoren / FAU-Herausgeberinnen und Herausgeber

Shahzad, Faisal
Regionales Rechenzentrum Erlangen (RRZE)
Wellein, Gerhard Prof. Dr.
Professur für Höchstleistungsrechnen
Wittmann, Markus
Regionales Rechenzentrum Erlangen (RRZE)
Zeiser, Thomas Dr.
Regionales Rechenzentrum Erlangen (RRZE)


Zitierweisen

APA:
Shahzad, F., Wittmann, M., Zeiser, T., & Wellein, G. (2012). Asynchronous Checkpointing by Dedicated Checkpoint Threads. In Recent Advances in the Message Passing Interface. (pp. 289-290). -: Springer-verlag.

MLA:
Shahzad, Faisal, et al. "Asynchronous Checkpointing by Dedicated Checkpoint Threads." Recent Advances in the Message Passing Interface. -: Springer-verlag, 2012. 289-290.

BibTeX: 

Zuletzt aktualisiert 2019-23-07 um 07:19