Cognitive simulation (CogSim) is an important and emerging workflow for HPC scientific exploration and scientific machine learning (SciML). One challenging workload for CogSim is the replacement of one component in a complex physical simulation with a fast, learned, surrogate model that is "inside" of the computational loop. The execution of this in-the-loop inference is particularly challenging because it requires frequent inference across multiple possible target models, can be on the simulation's critical path (latency bound), is subject to requests from multiple MPI ranks, and typically contains a small number of samples per request. In this paper we explore the use of large, dedicated Deep Learning / AI accelerators that are disaggregated from compute nodes for this CogSim workload. We compare the trade-offs of using these accelerators versus the node-local GPU accelerators on leadership-class HPC systems.
翻译:认知模拟(CogSim)是高常委会科学探索和科学机器学习(SciML)的一个重要且新出现的工作流程。对于CogSim来说,一个具有挑战性的工作量是将复杂的物理模拟中的一个部件替换为快速的、有学识的替代模型,该模型是计算环的“内侧”。 实施这种环状推论特别具有挑战性,因为它需要多次在多种可能的目标模型中反复推断,它可以在模拟的关键路径上(延缓),受多级MPI的要求影响,通常包含少量的样本。 在本文中,我们探索如何使用与CogSim工作量的计算节点分类的大型、专门的深层学习/AI加速器。我们比较了使用这些加速器与领导级别 HPC 系统的节点本地GPU加速器之间的权衡。