Qiao B, Reiche O, Hannig F, Teich J (2019)
Publication Language: English
Publication Type: Conference contribution, Original article
Publication year: 2019
Pages Range: 242-253
Conference Proceedings Title: Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
Event location: Washington, DC, USA
ISBN: 978-1-7281-1436-1
Optimizing data-intensive applications such as image processing for GPU targets with complex memory hierarchies requires to explore the tradeoffs among locality, parallelism, and computation. Loop fusion as one of the classical optimization techniques has been proven effective to improve locality at the function level. Algorithms in image processing are increasing their complexities and generally consist of many kernels in a pipeline. The inter-kernel communications are intensive and exhibit another opportunity for locality improvement at the system level. The scope of this paper is an optimization technique called kernel fusion for data locality improvement. We present a formal description of the problem by defining an objective function for locality optimization. By transforming the fusion problem to a graph partitioning problem, we propose a solution based on the minimum cut technique to search fusible kernels recursively. In addition, we develop an analytic model to quantitatively estimate potential locality improvement by incorporating domain-specific knowledge and architecture details. The proposed technique is implemented in an image processing DSL and source-to-source compiler called Hipacc, and evaluated over six image processing applications on three Nvidia GPUs. A geometric mean speedup of up to 2.52 can be observed in our experiments.
APA:
Qiao, B., Reiche, O., Hannig, F., & Teich, J. (2019). From Loop Fusion to Kernel Fusion: A Domain-specific Approach to Locality Optimization. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (pp. 242-253). Washington, DC, USA, US.
MLA:
Qiao, Bo, et al. "From Loop Fusion to Kernel Fusion: A Domain-specific Approach to Locality Optimization." Proceedings of the 2019 International Symposium on Code Generation and Optimization (CGO), Washington, DC, USA 2019. 242-253.
BibTeX: Download