Walter D, Adamtschuk T, Hannig F, Teich J (2024)
Publication Type: Conference contribution, Conference Contribution
Publication year: 2024
Conference Proceedings Title: Proceedings of the IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP)
Event location: Hong Kong
URI: https://ieeexplore.ieee.org/document/10631126
DOI: 10.1109/ASAP61560.2024.00029
LU decomposition is a widely used application for
solving systems of linear equations. It involves decomposing a given
matrix into a lower and upper triangular matrix. But if the matrix
size is large, using a block-based LU decomposition on smaller
submatrices can be advantageous. This approach allows for an
adaptation to a target architecture’s memory and computing
resources. In this paper, we analyze different strategies for
mapping block LU decompositions onto Tightly Coupled Processor
Arrays (TCPAs). Each decomposition introduces a dependence
graph of matrix operations of smaller size: an unblocked LU
decomposition, a triangular matrix solver, and a general matrix-
matrix multiplication. First, we propose one piecewise regular
algorithm for each corresponding loop nest, analyze its complexity,
and then explore various reuse schemes for configuration, data,
and synchronization. It will be shown that these schemes have a
significant impact on the execution time of the entire algorithm and
must be considered by the scheduling approach that governs the
individual loop program invocations. Our performance analysis of
the mapped block LU decomposition shows a maximal speedup of
12 on a TCPA of size 4 × 4 compared to a CPU, and a measured
speedup of 10 measured on an FPGA-based SoC.
APA:
Walter, D., Adamtschuk, T., Hannig, F., & Teich, J. (2024). Analysis and Optimization of Block LU Decomposition for Execution on Tightly Coupled Processor Arrays. In Proceedings of the IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP). Hong Kong.
MLA:
Walter, Dominik, et al. "Analysis and Optimization of Block LU Decomposition for Execution on Tightly Coupled Processor Arrays." Proceedings of the 35th IEEE International Conference on Application-specific Systems, Architectures and Processors, Hong Kong 2024.
BibTeX: Download