Loop Parallelization Techniques for FPGA Accelerator Synthesis

Reiche O, Özkan MA, Hannig F, Teich J, Schmid M (2018)

Publication Language: English

Publication Type: Journal article

Publication year: 2018

Journal

Journal of Signal Processing Systems Springer

Book Volume: 90

Pages Range: 3-27

Journal Issue: 1

DOI: 10.1007/s11265-017-1229-7

Abstract

Current tools for High-Level Synthesis (HLS) excel at exploiting Instruction-Level Parallelism (ILP). The support for Data-Level Parallelism (DLP), one of the key advantages of Field Programmable Gate Arrays (FPGAs), is in contrast very limited. This work examines the exploitation of DLP on FPGAs using code generation for C-based HLS of image filters and streaming pipelines. In addition to well-known loop tiling techniques, we propose loop coarsening, which delivers superior performance and scalability. Loop tiling corresponds to splitting an image into separate regions, which are then processed in parallel by replicated accelerators. For data streaming, this also requires the generation of glue logic for the distribution of image data. Conversely, loop coarsening allows processing multiple pixels in parallel, whereby only the kernel operator is replicated within a single accelerator. We present concrete implementations of tiling and coarsening for Vivado HLS and Altera OpenCL. Furthermore, we present a comparison of our implementations to the keyword-driven parallelization support provided by the Altera Offline Compiler. We augment the FPGA back end of the heterogeneous Domain-Specific Language (DSL) framework Hipacc to generate loop coarsening implementations for Vivado HLS and Altera OpenCL. Moreover, we compare the resulting FPGA accelerators to highly optimized software implementations for Graphics Processing Units (GPUs), all generated from exactly the same code base.

Authors with CRIS profile

Oliver Reiche Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design) Mehmet Akif Özkan Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design) Frank Hannig Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design) Jürgen Teich Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design) Moritz Schmid Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)

Related research project(s)

GRK 1773: Heterogeneous Image Systems, Project B3 (GRK 1773) RTG 1773: Heterogeneous Image Systems Oct. 1, 2012 - March 31, 2017 A Domain-Specific Language and GPU Target Code Generator for Image Processing Applications (HIPAcc) Jan. 1, 2009 - Jan. 1, 2012

How to cite

APA:

Reiche, O., Özkan, M.A., Hannig, F., Teich, J., & Schmid, M. (2018). Loop Parallelization Techniques for FPGA Accelerator Synthesis. Journal of Signal Processing Systems, 90(1), 3-27. https://doi.org/10.1007/s11265-017-1229-7

MLA:

Reiche, Oliver, et al. "Loop Parallelization Techniques for FPGA Accelerator Synthesis." Journal of Signal Processing Systems 90.1 (2018): 3-27.

BibTeX: Download