CPU-aware, process-level redundancy to tolerate faults in multi-core

Beitrag bei einer Tagung
(Konferenzbeitrag)


Details zur Publikation

Autorinnen und Autoren: Aliee H, Zarandi HR, Tajary A
Jahr der Veröffentlichung: 2011
Seitenbereich: 343-349
ISBN: 9781612843810


Abstract


This paper proposes: 1) A dynamically scheduled Process-Level Redundancy (PLR) for enhancing reliability of multi-core systems, 2) A comparison between PLR and Thread-Level Redundancy (TLR), and 3) A fault study on the thread selector unit of a modern processor. The proposed technique employs underutilized CPU resources to improve fault tolerance ability of a system. The evaluation on PLR reliability proves that it performs better than Thread-Level Redundancy (TLR) when the reliability of sub modules in a system is higher than almost 0.8. In this technique, a set of redundant processes are created per application process. The number of replicas is then modified dynamically to achieve better performance. The experimental results on some standard benchmarks show that on average, the CPU is utilized less than 20% during the execution time of applications which can be used to provide 100% fault detection and recovery with almost 10% performance overhead using the proposed technique. Also, the fault study proves that among 7000 faults injected into the thread selector module using OpenSPARC simulator, 83.5% of faults are benign faults, and 16.5% of faults lead to system failure which affect either hardware (13.7%), or program outputs (2.8%). These faults can be all detected using this technique. © 2011 IEEE.



FAU-Autorinnen und Autoren / FAU-Herausgeberinnen und Herausgeber

Aliee, Hananeh
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)


Einrichtungen weiterer Autorinnen und Autoren

Amirkabir University of Technology (AUT) / دانشگاه صنعتی امیرکبیر


Zitierweisen

APA:
Aliee, H., Zarandi, H.R., & Tajary, A. (2011). CPU-aware, process-level redundancy to tolerate faults in multi-core. (pp. 343-349). Istanbul.

MLA:
Aliee, Hananeh, Hamid R. Zarandi, and Alireza Tajary. "CPU-aware, process-level redundancy to tolerate faults in multi-core." Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011, Istanbul 2011. 343-349.

BibTeX: 

Zuletzt aktualisiert 2018-21-10 um 14:50

Link teilen