Sabih M, Mishra A, Hannig F, Teich J (2022)
Publication Language: English
Publication Type: Conference contribution, Conference Contribution
Publication year: 2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
City/Town: Pittsburgh, PA, USA
Pages Range: 1-8
Conference Proceedings Title: 2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)
Event location: Virtual
ISBN: 978-1-6654-6551-9
URI: https://ieeexplore.ieee.org/document/9969374
DOI: 10.1109/IGSC55832.2022.9969363
Open Access Link: https://www.computer.org/csdl/proceedings-article/igsc/2022/09969
Deep neural networks (DNNs) are computationally intensive, making them difficult to deploy on resource-constrained embedded systems. Model compression is a set of techniques that removes redundancies from a neural network with affordable degradation in task performance. Most compression methods do not target hardware-based objectives such as latency directly; however, few methods approximate latency with floating-point operations (FLOPs) or multiply-accumulate operations (MACs). Using these indirect metrics cannot directly translate to the relevant performance metric on the hardware, i.e., latency and throughput. To address this limitation, we introduce Multi-Objective Sensitivity Pruning, “MOSP,” a three-stage pipeline for filter pruning: hardware-aware sensitivity analysis, Criteria-optimal configuration selection, and pruning based on explainable AI (XAI). Our pipeline is compatible with a single or combination of target objectives such as latency, energy consumption, and accuracy. Our method first formulates the sensitivity of layers of a model against the target objectives as a classical machine learning problem. Next, we choose a Criteria-optimal configuration controlled by hyperparameters specific to each objective of choice. Finally, we apply XAI-based filter ranking to select filters to be pruned. The pipeline follows an iterative pruning methodology to recover any loss in degradation in task performance (e.g., accuracy). We allow the user to prefer one objective function over the other. Our method outperforms the selected baseline method across different neural networks and datasets in both accuracy and latency reductions and is competitive with state-of-the-art approaches.
APA:
Sabih, M., Mishra, A., Hannig, F., & Teich, J. (2022). MOSP: Multi-Objective Sensitivity Pruning of Deep Neural Networks. In IEEE (Eds.), 2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC) (pp. 1-8). Virtual: Pittsburgh, PA, USA: Institute of Electrical and Electronics Engineers (IEEE).
MLA:
Sabih, Muhammad, et al. "MOSP: Multi-Objective Sensitivity Pruning of Deep Neural Networks." Proceedings of the The 13th International Green and Sustainable Computing Conference (IGSC), Virtual Ed. IEEE, Pittsburgh, PA, USA: Institute of Electrical and Electronics Engineers (IEEE), 2022. 1-8.
BibTeX: Download