Sustainability in high performance computing (HPC) is a major challenge not only for HPC centers and their users, but also for society as the climate goals become stricter. A lot of effort went into reducing the energy consumption of systems in general. Even though certain efforts to optimize the energy-efficiency of HPC workloads exist, most such efforts propose solutions targeting CPUs. As HPC systems shift more and more to GPU-centric architectures, simulation codes increasingly adopt GPU-programming models. This leads to an urgent need to increase the energy-efficiency of GPU-enabled codes. However, studies for reducing the energy consumption of large-scale simulations executing on CPUs and GPUs have received insufficient attention. In this work, we enable accurate power and energy measurements using an open-source toolkit across a range of CPU+GPU node architectures. We use this approach in SPH-EXA, an open-source GPU-centric astrophysical and cosmological simulation framework. We show that with simple code instrumentation, users can accurately measure power and energy related data about their application, beyond data provided by HPC systems alone. The accurate power and energy data provide significant insight to users for conducting energy-aware computational experiments and future energy-aware code development.
翻译:暂无翻译