HPC 认知模拟是否可进行分解? (Is Disaggregation possible for HPC Cognitive Simulation?)

Cognitive simulation (CogSim) is an important and emerging workflow for HPC scientific exploration and scientific machine learning (SciML). One challenging workload for CogSim is the replacement of one component in a complex physical simulation with a fast, learned, surrogate model that is "inside" of the computational loop. The execution of this in-the-loop inference is particularly challenging because it requires frequent inference across multiple possible target models, can be on the simulation's critical path (latency bound), is subject to requests from multiple MPI ranks, and typically contains a small number of samples per request. In this paper we explore the use of large, dedicated Deep Learning / AI accelerators that are disaggregated from compute nodes for this CogSim workload. We compare the trade-offs of using these accelerators versus the node-local GPU accelerators on leadership-class HPC systems.

翻译：认知模拟(CogSim)是高常委会科学探索和科学机器学习(SciML)的一个重要且新出现的工作流程。对于CogSim来说,一个具有挑战性的工作量是将复杂的物理模拟中的一个部件替换为快速的、有学识的替代模型,该模型是计算环的“内侧”。实施这种环状推论特别具有挑战性,因为它需要多次在多种可能的目标模型中反复推断,它可以在模拟的关键路径上(延缓),受多级MPI的要求影响,通常包含少量的样本。在本文中,我们探索如何使用与CogSim工作量的计算节点分类的大型、专门的深层学习/AI加速器。我们比较了使用这些加速器与领导级别 HPC 系统的节点本地GPU加速器之间的权衡。

相关内容

Cognition

关注 4

Cognition：Cognition：International Journal of Cognitive Science Explanation：认知：国际认知科学杂志。 Publisher：Elsevier。 SIT： http://www.journals.elsevier.com/cognition/

【重磅】2022年IEEE Fellow出炉！ 310位新晋升会士！王海峰、田永鸿、汪玉、申恒涛等七十九位华人当选！

专知会员服务

7+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

专知会员服务

36+阅读 · 2020年5月9日

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

专知会员服务

7+阅读 · 2019年11月14日