Performance Tools


Description / Outline

The group develops open-source software in the areas of performance tools, cluster monitoring, and benchmarking.
In the area of “performance tools,” the well-known LIKWID tool collection (https://github.com/RRZE-HPC/likwid) is being developed. It contains various tools for the controlled execution of applications on modern computing nodes with complex topology and adaptive runtime parameters. By measuring appropriate hardware metrics, LIKWID enables a detailed analysis of the hardware usage of application programs and is therefore of central importance for the validation of performance models and the identification of performance patterns. The output of derived metrics, such as the main memory bandwidth used, requires continuous adaptation and validation of this tool to new computer architectures.
The static code analysis tool OSACA (Open Source Architecture Code Analyzer) can analyze assembler code and provides a runtime prediction within the computing core (https://github.com/RRZE-HPC/OSACA).
With ClusterCockpit (https://clustercockpit.org/), the group is developing a comprehensive HPC cluster monitoring solution. ClusterCockpit comprises the following components: cc-metric-collector (node agent on the compute nodes), cc-backend (REST API and web server backend including web-based user interface), cc-metric-store (in-memory metric database), cc-energy-manager (job-specific control of power capping settings, global power capping for a cluster), and cc-node-controller (setting system parameters at the node level).   ClusterCockpit offers both job-centric and node-centric views and is accessible to regular HPC users, support staff, and administrators. ClusterCockpit is in productive use at a large number of HPC centers.
Benchmark applications are an important tool for understanding performance-limiting factors and exploring new optimization opportunities. They are used to characterize hardware platforms and in research and teaching. The group is developing “The Bandwidth Benchmark” (https://github.com/RRZE-HPC/TheBandwidthBenchmark), an application for measuring the maximum achievable bandwidth on all levels of the memory hierarchy. MD-Bench (https://github.com/RRZE-HPC/MD-Bench) implements state-of-the-art algorithms in the field of molecular dynamics for CPUs and GPUs, including scalable MPI parallelization. SparseBench implements solvers for sparse systems of equations. Different memory formats are supported. SparseBench is also MPI-parallel. MachineState (https://github.com/RRZE-HPC/MachineState) collects and stores all performance-related information at the node level, thus making an important contribution to reproducible benchmark results.

Faculty/Institution