Generated domain-specific sparse data kernels for high-performance Lattice Boltzmann Methods

Suffa P, Rüde U (2023)


Publication Type: Conference contribution, Abstract of a poster

Publication year: 2023

Event location: Universität Stuttgart Keplerstraße 7 70174 Stuttgart Deutschland DE

Abstract

We present an extension of the code generation pipeline for the domain-
specific Lattice Boltzmann kernels within the framework lbmpy [3]. This ex-
tension enables sparse data kernels, i.e., kernels relying on indirect addressing.
Unlike direct addressing kernels, where every lattice cell of the domain is stored
in memory, sparse data kernels only store and compute on fluid cells, while
boundary cells are omitted. This is especially effective for domains containing a
large number of boundary cells, as they typically occur in porous media simula-
tions. Since these sparse data kernels are implemented in the lbmpy framework,
they support a large variety of collision models and stencils. Additionally, the
kernels can be generated for many common CPUs as well as for NVIDIA and
AMD GPUs.
Furthermore, the sparse data kernels are extended to support an in-place
streaming pattern, namely, the AA-pattern [2]. This method avoids storing a
temporary PDF field. Thus we save memory and reduce the number of memory
accesses for the LBM kernel from (3 · q) to (2 · q) PDF accesses, where q is
the number of PDFs in the stencil. Compared to direct addressing kernels,
the benefit of the AA pattern for indirect addressing kernels is even greater
because the index list, which is used to store neighboring information of the
PDF list, is only accessed in every second time step, so the index list accesses
in an AA-pattern sparse LBM kernel decreases from q − 1 to (q − 1)/2 [9]. This
is particularly useful because large-scale simulations based on the LBM are, in
general, memory-bound, so memory accesses are the performance bottleneck of
this method. Therefore it is very effective to reduce the memory accesses by
2·q+(q−1)/2
1 − ∼ 37%. Ideally, the performance could improve by the same
3·q+q−1
factor.
These generated sparse data kernels are integrated into the massively parallel
multiphysics framework waLBerla [4]. waLBerla supports a wide range
of applications. In particular, it is employed in soft matter simulations with
ESPResSo [8]. Here, however, we use the LAGOON [7] test case with 188
million fluid cells on 720 cores of the JUWELS cluster [1] as a prototype scenario
taken from the EU HPC project SCALABLE [6] to simulate fluid flow around
a prototypical landing gear. See Figure 1 showing the Q-criterion of the flow
field.
Additional scaling experiments will demonstrate the strong scalability of
waLBerla with the generated domain-specific sparse data kernels, including
benchmarks on GPU clusters such as JUWELS booster [1] and LUMI [5]. Fur-
thermore, the sparse data kernels will be optimized by enabling SIMD vectoriza-
tion on CPUs and communication hiding for all architectures. Weak scalability
will be demonstrated with a porous media simulation, while the LAGOON test
case serves to show the strong scalability of the framework.

Authors with CRIS profile

How to cite

APA:

Suffa, P., & Rüde, U. (2023). Generated domain-specific sparse data kernels for high-performance Lattice Boltzmann Methods. Poster presentation at nternational Conference on Data-Integrated Simulation Science (SimTech2023), Universität Stuttgart Keplerstraße 7 70174 Stuttgart Deutschland, DE.

MLA:

Suffa, Philipp, and Ulrich Rüde. "Generated domain-specific sparse data kernels for high-performance Lattice Boltzmann Methods." Presented at nternational Conference on Data-Integrated Simulation Science (SimTech2023), Universität Stuttgart Keplerstraße 7 70174 Stuttgart Deutschland 2023.

BibTeX: Download