On-demand fault-tolerant loop processing on massively parallel processor arrays

Tanase AP, Witterauf M, Teich J, Hannig F, Lari V (2015)


Publication Status: Published

Publication Type: Conference contribution, Conference Contribution

Publication year: 2015

Publisher: Institute of Electrical and Electronics Engineers Inc.

Pages Range: 194-201

Article Number: 7245734

Conference Proceedings Title: In Proceedings of the 26th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Event location: Toronto CA

ISBN: 9781479919246

DOI: 10.1109/ASAP.2015.7245734

Abstract

We present a compilation-based technique for providing on-demand structural redundancy for massively parallel processor arrays. Thereby, application programmers gain the capability to trade throughput for reliability according to application requirements. To protect parallel loop computations against errors, we propose to apply the well-known fault tolerance schemes dual modular redundancy (DMR) and triple modular redundancy (TMR) to a whole region of the processor array rather than individual processing elements. At the source code level, the compiler realizes these replication schemes with a program transformation that: (1) replicates a parallel loop program two or three times for DMR or TMR, respectively, and (2) introduces appropriate voting operations whose frequency and location may be chosen from three proposed variants. Which variant to choose depends, for example, on the error resilience needs of the application or the expected soft error rates. Finally, we explore the different tradeoffs of these variants in terms of performance overheads and error detection latency.

Authors with CRIS profile

Related research project(s)

How to cite

APA:

Tanase, A.-P., Witterauf, M., Teich, J., Hannig, F., & Lari, V. (2015). On-demand fault-tolerant loop processing on massively parallel processor arrays. In In Proceedings of the 26th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 194-201). Toronto, CA: Institute of Electrical and Electronics Engineers Inc..

MLA:

Tanase, Alexandru-Petru, et al. "On-demand fault-tolerant loop processing on massively parallel processor arrays." Proceedings of the 26th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2015, Toronto Institute of Electrical and Electronics Engineers Inc., 2015. 194-201.

BibTeX: Download