Qiao B, Özkan MA, Teich J, Hannig F (2020)
Publication Language: English
Publication Type: Conference contribution, Original article
Publication year: 2020
Publisher: IEEE
Conference Proceedings Title: Proceedings of the 57th Annual Design Automation Conference (DAC)
Event location: San Francisco, CA
DOI: 10.1109/DAC18072.2020.9218531
CUDA graph is an asynchronous task-graph programming model recently released by Nvidia. It encapsulates application workflows in a graph, with nodes being operations connected by dependencies. The new API brings two benefits: Reduced work launch overhead and whole workflow optimizations. In this paper, we improve the ability of CUDA graph to exploit workflow optimizations, e.g. concurrent kernel executions with complementary resource occupancy. Additionally, we argue that the advantages of DSLs are complementary to CUDA graph, and joining the two techniques can benefit from the best of both worlds. Here, we propose a compiler-based approach that combines CUDA graph with an image processing DSL and a source-to-source compiler called Hipacc. For ten image processing applications benchmarked on two Nvidia GPUs, our approach is able to achieve a geometric mean speedup of 1.30 over Hipacc without CUDA graph, 1.11 over CUDA graph without Hipacc, and 3.96 over another state-of-the-art DSL called Halide.
APA:
Qiao, B., Özkan, M.A., Teich, J., & Hannig, F. (2020). The Best of Both Worlds: Combining CUDA Graph with an Image Processing DSL. In Proceedings of the 57th Annual Design Automation Conference (DAC). San Francisco, CA, US: IEEE.
MLA:
Qiao, Bo, et al. "The Best of Both Worlds: Combining CUDA Graph with an Image Processing DSL." Proceedings of the 57th Annual Design Automation Conference (DAC), San Francisco, CA IEEE, 2020.
BibTeX: Download