With the approach of Exascale computing power for large-scale High Performance Computing (HPC) clusters, the gap between compute capabilities and storage systems is growing larger. This is particularly problematic for the Weather Research and Forecasting Model (WRF), a widely-used HPC application for high-resolution forecasting and research that produces sizable datasets, especially when analyzing transient weather phenomena. Despite this issue, the I/O modules within WRF have not been updated in the past ten years, resulting in subpar parallel I/O performance. This research paper demonstrates the positive impact of integrating ADIOS2, a next-generation parallel I/O framework, as a new I/O backend option in WRF. It goes into detail about the challenges encountered during the integration process and how they were addressed. The resulting I/O times show an over tenfold improvement when using ADIOS2 compared to traditional MPI-I/O based solutions. Furthermore, the study highlights the new features available to WRF users worldwide, such as the Sustainable Staging Transport (SST) enabling Unified Communication X (UCX) DataTransport, the node-local burst buffer write capabilities and in-line lossless compression capabilities of ADIOS2. Additionally, the research shows how ADIOS2's in-situ analysis capabilities can be smoothly integrated with a simple WRF forecasting pipeline, resulting in a significant improvement in overall time to solution. This study serves as a reminder to legacy HPC applications that incorporating modern libraries and tools can lead to considerable performance enhancements with minimal changes to the core application.
翻译:随着大规模高性能计算(HPC)集群的 Exascale 计算能力的到来,计算能力和存储系统之间的差距越来越大。这对于广泛使用的高分辨率预报和研究 HPC 应用程序 Weather Research and Forecasting Model(WRF)来说是特别棘手的问题,该应用程序会产生大量数据集,尤其是在分析瞬态天气现象时。尽管存在这个问题,但 WRF 中的 I/O 模块在过去十年中没有更新,导致并行 I/O 性能不尽理想。本研究论文展示了将 Adios2,一种新一代并行 I/O 框架集成为 WRF 的新 I/O 后端选项所造成的积极影响。它详细介绍了在集成过程中遇到的挑战以及如何解决这些挑战。结果表明,与传统的 MPI-I/O 方案相比,使用 Adios2 时 I/O 时间可以提高十倍以上。此外,该研究还突出了 Adios2 提供给全球 WRF 用户的新功能,例如 Sustainable Staging Transport(SST)使得 Unified Communication X(UCX)DataTransport 成为可能,支持节点本地 burst buffer 写入和内联无损压缩功能。此外,本研究还展示了如何将 Adios2 的站内分析功能与简单的 WRF 预测流程无缝集成,从而显著提高总的解决方案时间。本研究提醒传统的 HPC 应用程序,将现代库和工具整合到应用程序中可能会带来可观的性能提升,且对核心应用程序进行极少量的更改。