Fault-Tolerant Low-Precision DNNs using Explainable AI

Sabih M, Hannig F, Teich J (2021)

Publication Language: English

Publication Type: Conference contribution, Conference Contribution

Publication year: 2021

Publisher: IEEE Xplore

Conference Proceedings Title: 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)

Event location: Virtual Workshop

ISBN: 978-1-6654-3951-0

URI: https://ieeexplore.ieee.org/document/9502445/

DOI: 10.1109/DSN-W52860.2021.00036

Abstract

Hardware-efficient machine learning systems deployed in safety-critical systems need to be optimized for two conflicting objectives. One is making neural networks more compact and efficient by techniques such as quantization and pruning. The other is making them robust against faults. Robustness of Deep Neural Networks (DNNs) becomes a challenge in low-bit precision neural networks such as networks quantized with 8-bit integer precision or less and on custom DNN accelerators with emerging memory technologies and techniques such as approximate memory. Errors in the processing of DNNs can be modeled as bit-flips at the software level. These bit-flips can be caused by persistent memory errors, manifesting themselves in corrupted DNN weights, or they can be caused by transient soft errors.In this work, we introduce an open-source fault-injection framework to simulate both persistent errors and transient errors on PyTorch. Subsequently, we propose novel algorithms that utilize explainable AI methods to make DNNs robust against both kinds, i.e., persistent memory errors and transient soft errors, while keeping the overhead small. We show that our approach can be beneficially used here to mitigate (1) persistent errors by identifying important bits in weights and selectively protecting these using error correcting codes, and (2) transient errors by identifying important samples in activations and selectively protecting these using Triple Modular Redundancy (TMR). Our proposed method outperforms the previously known work on selectively protecting DNN bits in terms of performance and creates further opportunities for research in the area of utilizing explainable AI for robustness.

Authors with CRIS profile

Muhammad Sabih Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design) Frank Hannig Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design) Jürgen Teich Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)

Related research project(s)

Neural Approximate Accelerator Architecture Optimization for DNN Inference on Lightweight FPGAs (NA³Os) March 1, 2024 - Feb. 28, 2027 AI Laboratory for System-level Design of ML-based Signal Processing Applications (KISS) Oct. 1, 2019 - Jan. 31, 2022

How to cite

APA:

Sabih, M., Hannig, F., & Teich, J. (2021). Fault-Tolerant Low-Precision DNNs using Explainable AI. In 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). Virtual Workshop: IEEE Xplore.

MLA:

Sabih, Muhammad, Frank Hannig, and Jürgen Teich. "Fault-Tolerant Low-Precision DNNs using Explainable AI." Proceedings of the Workshop on Dependable and Secure Machine Learning (DSML), Virtual Workshop IEEE Xplore, 2021.

BibTeX: Download