Loop Parallelization Techniques for FPGA Accelerator Synthesis

Beitrag in einer Fachzeitschrift

Details zur Publikation

Autorinnen und Autoren: Reiche O, Özkan MA, Hannig F, Teich J, Schmid M
Zeitschrift: Journal of Signal Processing Systems
Jahr der Veröffentlichung: 2018
Band: 90
Heftnummer: 1
Seitenbereich: 3-27
ISSN: 1939-8115
Sprache: Englisch


Current tools for High-Level Synthesis (HLS) excel at exploiting Instruction-Level Parallelism (ILP). The support for Data-Level Parallelism (DLP), one of the key advantages of Field Programmable Gate Arrays (FPGAs), is in contrast very limited. This work examines the exploitation of DLP on FPGAs using code generation for C-based HLS of image filters and streaming pipelines. In addition to well-known loop tiling techniques, we propose loop coarsening, which delivers superior performance and scalability. Loop tiling corresponds to splitting an image into separate regions, which are then processed in parallel by replicated accelerators. For data streaming, this also requires the generation of glue logic for the distribution of image data. Conversely, loop coarsening allows processing multiple pixels in parallel, whereby only the kernel operator is replicated within a single accelerator. We present concrete implementations of tiling and coarsening for Vivado HLS and Altera OpenCL. Furthermore, we present a comparison of our implementations to the keyword-driven parallelization support provided by the Altera Offline Compiler. We augment the FPGA back end of the heterogeneous Domain-Specific Language (DSL) framework Hipacc to generate loop coarsening implementations for Vivado HLS and Altera OpenCL. Moreover, we compare the resulting FPGA accelerators to highly optimized software implementations for Graphics Processing Units (GPUs), all generated from exactly the same code base.

FAU-Autorinnen und Autoren / FAU-Herausgeberinnen und Herausgeber

Hannig, Frank PD Dr.-Ing.
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)
Özkan, Mehmet Akif
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)
Reiche, Oliver
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)
Schmid, Moritz
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)
Teich, Jürgen Prof. Dr.-Ing.
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)


Architecture and Compiler Design
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)


Reiche, O., Özkan, M.A., Hannig, F., Teich, J., & Schmid, M. (2018). Loop Parallelization Techniques for FPGA Accelerator Synthesis. Journal of Signal Processing Systems, 90(1), 3-27. https://dx.doi.org/10.1007/s11265-017-1229-7

Reiche, Oliver, et al. "Loop Parallelization Techniques for FPGA Accelerator Synthesis." Journal of Signal Processing Systems 90.1 (2018): 3-27.


Zuletzt aktualisiert 2019-02-01 um 19:10