Current HPC systems provide memory resources that are statically configured and tightly coupled with compute nodes. However, workloads on HPC systems are evolving. Diverse workloads lead to a need for configurable memory resources to achieve high performance and utilization. In this study, we evaluate a memory subsystem design leveraging CXL-enabled memory pooling. Two promising use cases of composable memory subsystems are studied -- fine-grained capacity provisioning and scalable bandwidth provisioning. We developed an emulator to explore the performance impact of various memory compositions. We also provide a profiler to identify the memory usage patterns in applications and their optimization opportunities. Seven scientific and six graph applications are evaluated on various emulated memory configurations. Three out of seven scientific applications had less than 10% performance impact when the pooled memory backed 75% of their memory footprint. The results also show that a dynamically configured high-bandwidth system can effectively support bandwidth-intensive unstructured mesh-based applications like OpenFOAM. Finally, we identify interference through shared memory pools as a practical challenge for adoption on HPC systems.
翻译:目前HPC系统提供静态配置和紧紧配合计算节点的记忆资源。然而,HPC系统的工作量正在变化。不同的工作量导致需要配置可配置的记忆资源以实现高性能和利用。在本研究中,我们评估了利用 CXL 功能的记忆集合的内存子系统设计;研究了两个可合成的内存子系统有希望使用的案例 -- -- 精细配置能力和可缩放带宽提供。我们开发了一个模拟器,以探索各种内存构成的性能影响。我们还提供了一个剖面仪,以确定应用程序中的内存使用模式及其优化机会。7个科学和6个图形应用程序在各种可复制的内存配置上进行了评价。7个科学应用程序中的3个在集合的内存支持75%的内存足迹时,性能影响不到10%。结果还表明,动态配置的高带宽度宽度宽度网谱系统可以有效支持像 OpenFOAM这样的带宽度非结构网基应用。最后,我们发现,通过共享的内存库干扰是对HPC系统采用的实际挑战。