An efficient, dynamically adaptive method to tolerate transient faults in multi-core systems

Beitrag bei einer Tagung
(Konferenzbeitrag)


Details zur Publikation

Autor(en): Aliee H, Zarandi HR
Jahr der Veröffentlichung: 2011
Tagungsband: roceedings of the 13th European Workshop on Dependable Computing (EWDC '11)
Seitenbereich: 53-58
ISBN: 9781450302845


Abstract


This paper presents an adaptive, CPU-aware, fault detection and recovery approach which dynamically modifies the number of replicas in the system. This technique utilizes available unused resources as redundancy. It is transparent for users and does not require modification to the application. This technique benefits from the fact that although all the future product designs are dedicated to multi-cores, these products suffer from poor parallelism in applications. Therefore, there are underutilized CPU resources in the system which can be employed for fault tolerance aim. This is achieved by monitoring the system status periodically, on runtime which creates a set of redundant processes per application. To prevent performance degradation, redundant processes are dynamically scheduled. This technique is more beneficial when the number of cores increases or the application is IO-based with much underutilized CPU resources. Experimental results on a real quad-core system prove that on average, the applications from standard benchmarks like SPLASH-2, PARSEC, and some other suites, utilize the CPU less than 20% which provides high fault detection and recovery with almost 10% performance overhead. Copyright © 2011 ACM.



FAU-Autoren / FAU-Herausgeber

Aliee, Hananeh
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)


Autor(en) der externen Einrichtung(en)
Amirkabir University of Technology (AUT) / دانشگاه صنعتی امیرکبیر


Zitierweisen

APA:
Aliee, H., & Zarandi, H.R. (2011). An efficient, dynamically adaptive method to tolerate transient faults in multi-core systems. In roceedings of the 13th European Workshop on Dependable Computing (EWDC '11) (pp. 53-58). Pisa, IT.

MLA:
Aliee, Hananeh, and Hamid R. Zarandi. "An efficient, dynamically adaptive method to tolerate transient faults in multi-core systems." Proceedings of the 13th European Workshop on Dependable Computing, EWDC 2011, Pisa 2011. 53-58.

BibTeX: 

Zuletzt aktualisiert 2018-22-10 um 21:50