A Flexible FPGA-based Inference Architecture for Pruned Deep Neural Networks

Posewsky T, Ziener D (2018)

Publication Language: English

Publication Type: Conference contribution, Conference Contribution

Publication year: 2018

Conference Proceedings Title: Proceedings of the International Conference on Architecture of Computing Systems

Event location: Braunschweig

Abstract

In this paper, we present an architecture for embedded FPGA-based deep neural network inference which is able to handle pruned weight matrices. Pruning of weights and neurons reduces significantly the amount of stored data and calculations which improves enormously the efficiency and performance of inference of neural networks in embedded devices. By using an HLS approach, the architecture is easily extendable and the number of used MAC units and activation functions is configurable at design time. For large neural networks, our approach has at least comparable performance as state-of-the-art x86-based software implementation by using only 10\% of the energy.

Authors with CRIS profile

Daniel Ziener Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)

Involved external institutions

Technische Universität Hamburg-Harburg (TUHH)

Germany (DE)

How to cite

APA:

Posewsky, T., & Ziener, D. (2018). A Flexible FPGA-based Inference Architecture for Pruned Deep Neural Networks. In Proceedings of the International Conference on Architecture of Computing Systems. Braunschweig, DE.

MLA:

Posewsky, Thorbjörn, and Daniel Ziener. "A Flexible FPGA-based Inference Architecture for Pruned Deep Neural Networks." Proceedings of the International Conference on Architecture of Computing Systems, Braunschweig 2018.

BibTeX: Download