Posewsky T, Ziener D (2018)
Publication Language: English
Publication Type: Conference contribution, Conference Contribution
Publication year: 2018
Conference Proceedings Title: Proceedings of the International Conference on Architecture of Computing Systems
DOI: 10.1007/978-3-319-77610-1_23
In this paper, we present an architecture for embedded FPGA-based deep neural network inference which is able to handle pruned weight matrices. Pruning of weights and neurons reduces significantly the amount of stored data and calculations which improves enormously the efficiency and performance of inference of neural networks in embedded devices. By using an HLS approach, the architecture is easily extendable and the number of used MAC units and activation functions is configurable at design time. For large neural networks, our approach has at least comparable performance as state-of-the-art x86-based software implementation by using only 10\% of the energy.
APA:
Posewsky, T., & Ziener, D. (2018). A Flexible FPGA-based Inference Architecture for Pruned Deep Neural Networks. In Proceedings of the International Conference on Architecture of Computing Systems. Braunschweig, DE.
MLA:
Posewsky, Thorbjörn, and Daniel Ziener. "A Flexible FPGA-based Inference Architecture for Pruned Deep Neural Networks." Proceedings of the International Conference on Architecture of Computing Systems, Braunschweig 2018.
BibTeX: Download