We demonstrate the first climate-scale, numerical ocean simulations improved through distributed, online inference of Deep Neural Networks (DNN) using SmartSim. SmartSim is a library dedicated to enabling online analysis and Machine Learning (ML) for traditional HPC simulations. In this paper, we detail the SmartSim architecture and provide benchmarks including online inference with a shared ML model on heterogeneous HPC systems. We demonstrate the capability of SmartSim by using it to run a 12-member ensemble of global-scale, high-resolution ocean simulations, each spanning 19 compute nodes, all communicating with the same ML architecture at each simulation timestep. In total, 970 billion inferences are collectively served by running the ensemble for a total of 120 simulated years. Finally, we show our solution is stable over the full duration of the model integrations, and that the inclusion of machine learning has minimal impact on the simulation runtimes.
翻译:我们用SmartSim(SmartSim)在线推论深海神经网络(DNN),展示了第一个通过分布式、在线推论改进的气候尺度、数字海洋模拟。SmartSim(SmartSim)是一个图书馆,专门为传统的HPC模拟提供在线分析和机器学习(ML)能力。我们在本文件中详细介绍了SmartSim(SmartSim)架构,并提供了基准,包括不同HPC系统共享ML模型的在线推论。我们通过使用SmartSim(SmartSim)运行一个12个成员的全球规模、高分辨率海洋模拟组合,显示SmartSim(DSim)的能力,每个模拟共涵盖19个计算节点,每次模拟时段都与同一个ML(ML)架构进行通信。总共9 700亿个推论者通过共运行共120个模拟年的共120个模拟年组合来集体使用。最后,我们展示了我们在整个模型整合期间的解决方案是稳定的,并且机器学习对模拟运行时间的影响最小。