Sabih M, Hannig F, Teich J (2021)
Publication Language: English
Publication Type: Conference contribution, Conference Contribution
Publication year: 2021
Publisher: IEEE Xplore
Conference Proceedings Title: 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)
Event location: Virtual Workshop
Hardware-efficient machine learning systems deployed in safety-critical systems need to be optimized for two conflicting objectives. One is making neural networks more compact and efficient by techniques such as quantization and pruning. The other is making them robust against faults. Robustness of Deep Neural Networks (DNNs) becomes a challenge in low-bit precision neural networks such as networks quantized with 8-bit integer precision or less and on custom DNN accelerators with emerging memory technologies and techniques such as approximate memory. Errors in the processing of DNNs can be modeled as bit-flips at the software level. These bit-flips can be caused by persistent memory errors, manifesting themselves in corrupted DNN weights, or they can be caused by transient soft errors.In this work, we introduce an open-source fault-injection framework to simulate both persistent errors and transient errors on PyTorch. Subsequently, we propose novel algorithms that utilize explainable AI methods to make DNNs robust against both kinds, i.e., persistent memory errors and transient soft errors, while keeping the overhead small. We show that our approach can be beneficially used here to mitigate (1) persistent errors by identifying important bits in weights and selectively protecting these using error correcting codes, and (2) transient errors by identifying important samples in activations and selectively protecting these using Triple Modular Redundancy (TMR). Our proposed method outperforms the previously known work on selectively protecting DNN bits in terms of performance and creates further opportunities for research in the area of utilizing explainable AI for robustness.
Sabih, M., Hannig, F., & Teich, J. (2021). Fault-Tolerant Low-Precision DNNs using Explainable AI. In 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). Virtual Workshop: IEEE Xplore.
Sabih, Muhammad, Frank Hannig, and Jürgen Teich. "Fault-Tolerant Low-Precision DNNs using Explainable AI." Proceedings of the Workshop on Dependable and Secure Machine Learning (DSML), Virtual Workshop IEEE Xplore, 2021.