Loop Parallelization Techniques for FPGA Accelerator Synthesis

Reiche O, Özkan MA, Hannig F, Teich J, Schmid M (2018)

Publication Language: English

Publication Type: Journal article

Publication year: 2018


Book Volume: 90

Pages Range: 3-27

Journal Issue: 1

DOI: 10.1007/s11265-017-1229-7


Current tools for High-Level Synthesis (HLS) excel at exploiting Instruction-Level Parallelism (ILP). The support for Data-Level Parallelism (DLP), one of the key advantages of Field Programmable Gate Arrays (FPGAs), is in contrast very limited. This work examines the exploitation of DLP on FPGAs using code generation for C-based HLS of image filters and streaming pipelines. In addition to well-known loop tiling techniques, we propose loop coarsening, which delivers superior performance and scalability. Loop tiling corresponds to splitting an image into separate regions, which are then processed in parallel by replicated accelerators. For data streaming, this also requires the generation of glue logic for the distribution of image data. Conversely, loop coarsening allows processing multiple pixels in parallel, whereby only the kernel operator is replicated within a single accelerator. We present concrete implementations of tiling and coarsening for Vivado HLS and Altera OpenCL. Furthermore, we present a comparison of our implementations to the keyword-driven parallelization support provided by the Altera Offline Compiler. We augment the FPGA back end of the heterogeneous Domain-Specific Language (DSL) framework Hipacc to generate loop coarsening implementations for Vivado HLS and Altera OpenCL. Moreover, we compare the resulting FPGA accelerators to highly optimized software implementations for Graphics Processing Units (GPUs), all generated from exactly the same code base.

Authors with CRIS profile

Related research project(s)

How to cite


Reiche, O., Özkan, M.A., Hannig, F., Teich, J., & Schmid, M. (2018). Loop Parallelization Techniques for FPGA Accelerator Synthesis. Journal of Signal Processing Systems, 90(1), 3-27. https://doi.org/10.1007/s11265-017-1229-7


Reiche, Oliver, et al. "Loop Parallelization Techniques for FPGA Accelerator Synthesis." Journal of Signal Processing Systems 90.1 (2018): 3-27.

BibTeX: Download