Quelloffene Lösungsansätze für Monitoring und Systemeinstellungen für energieoptimierte Rechenzentren (Green-IT EE-HPC)

Third party funded individual grant


Acronym: Green-IT EE-HPC

Start date : 01.09.2022

End date : 31.08.2025

Website: https://eehpc.clustercockpit.org/


Project details

Short description

The energy consumption of HPC data centers is a decisive factor in the procurement and operation of the systems. EE-HPC achieves a more efficient energy use of HPC systems by targeted job-specific control and optimization of the hardware configuration as well as of settings of the runtime environments.

Scientific Abstract

The aim of this project is to reduce power consumption while maximizing throughput in the operation of HPC systems. This is achieved by optimally adjusting system parameters that have an influence on energy consumption to the respective running jobs. To map the throughput of useful work, the Energy Productivity of the IT Equipment metric specified by KPI4DCE is used. The savings potential is demonstrated at all participating data centers for two selected applications each. This project combines a comprehensive job-specific measurement and control infrastructure with machine learning (ML) techniques and software-hardware co-design with the ability to control energy parameters via runtime environments. Policies are used to specify the framework conditions, and the actual optimization of system parameters is automatic and adaptive. To achieve the goals, the GEOPM open-source framework must be extended to include a machine learning component. To make the most of the potential for energy savings, automatic phase detection will be developed, as well as extensions to the MPI and OpenMP runtime environments that allow information about application state to be communicated to the GEOPM framework. To capture required time-resolved metrics on energy consumption as well as performance behavior of the application, interfaces and extensions in LIKWID will be developed. For visualization and control of the GEOPM functionality, the framework is extended to the job-specific Performance Monitoring ClusterCockpit and coupled with GEOPM. The novelty of the solution approach is the development and provision of a product-ready software environment for a fully user-transparent energy optimization of HPC applications. The project builds on existing open source software components and integrates, extends and adapts them for the new requirements.

Involved:

Contributing FAU Organisations:

Funding Source