A Flexible FPGA-based Inference Architecture for Pruned Deep Neural Networks

Beitrag bei einer Tagung
(Konferenzbeitrag)


Details zur Publikation

Autor(en): Posewsky T, Ziener D
Jahr der Veröffentlichung: 2018
Tagungsband: Proceedings of the International Conference on Architecture of Computing Systems
Sprache: Englisch


Abstract


In this paper, we present an architecture for embedded FPGA-based deep neural network inference which is able to handle pruned weight matrices. Pruning of weights and neurons reduces significantly the amount of stored data and calculations which improves enormously the efficiency and performance of inference of neural networks in embedded devices. By using an HLS approach, the architecture is easily extendable and the number of used MAC units and activation functions is configurable at design time. For large neural networks, our approach has at least comparable performance as state-of-the-art x86-based software implementation by using only 10\% of the energy.



FAU-Autoren / FAU-Herausgeber

Ziener, Daniel Dr.-Ing.
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)


Autor(en) der externen Einrichtung(en)
Technische Universität Hamburg-Harburg (TUHH)


Zitierweisen

APA:
Posewsky, T., & Ziener, D. (2018). A Flexible FPGA-based Inference Architecture for Pruned Deep Neural Networks. In Proceedings of the International Conference on Architecture of Computing Systems. Braunschweig, DE.

MLA:
Posewsky, Thorbjörn, and Daniel Ziener. "A Flexible FPGA-based Inference Architecture for Pruned Deep Neural Networks." Proceedings of the International Conference on Architecture of Computing Systems, Braunschweig 2018.

BibTeX: 

Zuletzt aktualisiert 2018-08-08 um 03:49