Loop Parallelization Techniques for FPGA Accelerator Synthesis

Journal article

Publication Details

Author(s): Reiche O, Özkan MA, Hannig F, Teich J, Schmid M
Journal: Journal of Signal Processing Systems
Publication year: 2018
Volume: 90
Journal issue: 1
Pages range: 3-27
ISSN: 1939-8115
Language: English


Current tools for High-Level Synthesis (HLS) excel at exploiting Instruction-Level Parallelism (ILP). The support for Data-Level Parallelism (DLP), one of the key advantages of Field Programmable Gate Arrays (FPGAs), is in contrast very limited. This work examines the exploitation of DLP on FPGAs using code generation for C-based HLS of image filters and streaming pipelines. In addition to well-known loop tiling techniques, we propose loop coarsening, which delivers superior performance and scalability. Loop tiling corresponds to splitting an image into separate regions, which are then processed in parallel by replicated accelerators. For data streaming, this also requires the generation of glue logic for the distribution of image data. Conversely, loop coarsening allows processing multiple pixels in parallel, whereby only the kernel operator is replicated within a single accelerator. We present concrete implementations of tiling and coarsening for Vivado HLS and Altera OpenCL. Furthermore, we present a comparison of our implementations to the keyword-driven parallelization support provided by the Altera Offline Compiler. We augment the FPGA back end of the heterogeneous Domain-Specific Language (DSL) framework Hipacc to generate loop coarsening implementations for Vivado HLS and Altera OpenCL. Moreover, we compare the resulting FPGA accelerators to highly optimized software implementations for Graphics Processing Units (GPUs), all generated from exactly the same code base.

FAU Authors / FAU Editors

Hannig, Frank PD Dr.-Ing.
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)
Özkan, Mehmet Akif
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)
Reiche, Oliver
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)
Schmid, Moritz
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)
Teich, Jürgen Prof. Dr.-Ing.
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)

Research Fields

Architecture and Compiler Design
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)

How to cite

Reiche, O., Özkan, M.A., Hannig, F., Teich, J., & Schmid, M. (2018). Loop Parallelization Techniques for FPGA Accelerator Synthesis. Journal of Signal Processing Systems, 90(1), 3-27. https://dx.doi.org/10.1007/s11265-017-1229-7

Reiche, Oliver, et al. "Loop Parallelization Techniques for FPGA Accelerator Synthesis." Journal of Signal Processing Systems 90.1 (2018): 3-27.


Last updated on 2019-02-01 at 19:10