Predicting multiphysics dynamics is computationally expensive and challenging due to the severe coupling of multi-scale, heterogeneous physical processes. While neural surrogates promise a paradigm shift, the field currently suffers from an "illusion of mastery", as repeatedly emphasized in top-tier commentaries: existing evaluations overly rely on simplified, low-dimensional proxies, which fail to expose the models' inherent fragility in realistic regimes. To bridge this critical gap, we present REALM (REalistic AI Learning for Multiphysics), a rigorous benchmarking framework designed to test neural surrogates on challenging, application-driven reactive flows. REALM features 11 high-fidelity datasets spanning from canonical multiphysics problems to complex propulsion and fire safety scenarios, alongside a standardized end-to-end training and evaluation protocol that incorporates multiphysics-aware preprocessing and a robust rollout strategy. Using this framework, we systematically benchmark over a dozen representative surrogate model families, including spectral operators, convolutional models, Transformers, pointwise operators, and graph/mesh networks, and identify three robust trends: (i) a scaling barrier governed jointly by dimensionality, stiffness, and mesh irregularity, leading to rapidly growing rollout errors; (ii) performance primarily controlled by architectural inductive biases rather than parameter count; and (iii) a persistent gap between nominal accuracy metrics and physically trustworthy behavior, where models with high correlations still miss key transient structures and integral quantities. Taken together, REALM exposes the limits of current neural surrogates on realistic multiphysics flows and offers a rigorous testbed to drive the development of next-generation physics-aware architectures.
翻译:多物理场动力学预测因多尺度、异质物理过程的强耦合而计算成本高昂且极具挑战性。尽管神经代理模型有望带来范式转变,但该领域目前正遭受“掌握幻觉”的困扰——正如顶级评论反复强调的:现有评估过度依赖简化的低维代理问题,未能揭示模型在真实场景中固有的脆弱性。为弥合这一关键差距,我们提出了REALM(面向多物理场的真实人工智能学习框架),这是一个严谨的基准测试框架,旨在具有挑战性的应用驱动型反应流上检验神经代理模型。REALM包含11个高保真数据集,涵盖从经典多物理场问题到复杂推进与火灾安全场景,并配有标准化的端到端训练与评估协议,该协议整合了多物理场感知的预处理流程和稳健的推演策略。利用该框架,我们系统性地对十余个代表性代理模型家族进行了基准测试,包括谱算子、卷积模型、Transformer、逐点算子及图/网格网络,并识别出三个稳健趋势:(i) 受维度、刚度和网格不规则性共同制约的缩放壁垒,导致推演误差快速增长;(ii) 性能主要受架构归纳偏置而非参数数量控制;(iii) 名义精度指标与物理可信行为之间存在持续差距,表现为高相关性模型仍会遗漏关键的瞬态结构和积分量。综上所述,REALM揭示了当前神经代理模型在真实多物理场流动中的局限性,并提供了一个严谨的测试平台,以推动下一代物理感知架构的发展。