The Best of Both Worlds: Combining CUDA Graph with an Image Processing DSL

Qiao B, Özkan MA, Teich J, Hannig F (2020)


Publication Language: English

Publication Type: Conference contribution, Original article

Publication year: 2020

Publisher: IEEE

Conference Proceedings Title: Proceedings of the 57th Annual Design Automation Conference (DAC)

Event location: San Francisco, CA US

DOI: 10.1109/DAC18072.2020.9218531

Abstract

CUDA graph is an asynchronous task-graph programming model recently released by Nvidia. It encapsulates application workflows in a graph, with nodes being operations connected by dependencies. The new API brings two benefits: Reduced work launch overhead and whole workflow optimizations. In this paper, we improve the ability of CUDA graph to exploit workflow optimizations, e.g. concurrent kernel executions with complementary resource occupancy. Additionally, we argue that the advantages of DSLs are complementary to CUDA graph, and joining the two techniques can benefit from the best of both worlds. Here, we propose a compiler-based approach that combines CUDA graph with an image processing DSL and a source-to-source compiler called Hipacc. For ten image processing applications benchmarked on two Nvidia GPUs, our approach is able to achieve a geometric mean speedup of 1.30 over Hipacc without CUDA graph, 1.11 over CUDA graph without Hipacc, and 3.96 over another state-of-the-art DSL called Halide.

Authors with CRIS profile

Related research project(s)

How to cite

APA:

Qiao, B., Özkan, M.A., Teich, J., & Hannig, F. (2020). The Best of Both Worlds: Combining CUDA Graph with an Image Processing DSL. In Proceedings of the 57th Annual Design Automation Conference (DAC). San Francisco, CA, US: IEEE.

MLA:

Qiao, Bo, et al. "The Best of Both Worlds: Combining CUDA Graph with an Image Processing DSL." Proceedings of the 57th Annual Design Automation Conference (DAC), San Francisco, CA IEEE, 2020.

BibTeX: Download