Martens A, Borchert C, Geissler TO, Lohmann D, Spinczyk O, Kapitza R (2014)
Publication Type: Conference contribution
Publication year: 2014
Publisher: IEEE Computer Society
Pages Range: 648-653
Conference Proceedings Title: Proceedings of the International Conference on Dependable Systems and Networks
Event location: Atlanta, GA, USA
ISBN: 9781479922338
DOI: 10.1109/DSN.2014.98
State-machine replication has received widespread attention for the provisioning of highly available services in data centers. However, current production systems focus on tolerating crash faults only and prominent service outages caused by state corruptions have indicated that this is a risky strategy. In the future, state corruptions due to transient faults (such as bit flips) become even more likely, caused by ongoing hardware trends regarding the shrinking of structure sizes and reduction of operating voltages. In this paper we present Crosscheck, an approach to tolerate arbitrary state corruption (ASC) in the context of fault-tolerant replication of multithreaded services. Crosscheck is able to detect silent data corruptions ahead of execution, and by crosschecking state changes with co-executing replicas, even ASCs can be detected. Finally, fault tolerance is achieved by a fine-grained recovery using fault-free replicas. Our implementation is transparent to the application by utilizing fine-grained software-hardening mechanisms using aspect-oriented programming. To validate Crosscheck we present a replicated multithreaded key-value store that is resilient to state corruptions.
APA:
Martens, A., Borchert, C., Geissler, T.O., Lohmann, D., Spinczyk, O., & Kapitza, R. (2014). Crosscheck: Hardening replicated multithreaded services. In Proceedings of the International Conference on Dependable Systems and Networks (pp. 648-653). Atlanta, GA, USA: IEEE Computer Society.
MLA:
Martens, Arthur, et al. "Crosscheck: Hardening replicated multithreaded services." Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014, Atlanta, GA, USA IEEE Computer Society, 2014. 648-653.
BibTeX: Download