Highly accurate datasets from numerical or physical experiments are often expensive and time-consuming to acquire, posing a significant challenge for applications that require precise evaluations, potentially across multiple scenarios and in real-time. Even building sufficiently accurate surrogate models can be extremely challenging with limited high-fidelity data. Conversely, less expensive, low-fidelity data can be computed more easily and encompass a broader range of scenarios. By leveraging multi-fidelity information, prediction capabilities of surrogates can be improved. However, in practical situations, data may be different in types, come from sources of different modalities, and not be concurrently available, further complicating the modeling process. To address these challenges, we introduce a progressive multi-fidelity surrogate model. This model can sequentially incorporate diverse data types using tailored encoders. Multi-fidelity regression from the encoded inputs to the target quantities of interest is then performed using neural networks. Input information progressively flows from lower to higher fidelity levels through two sets of connections: concatenations among all the encoded inputs, and additive connections among the final outputs. This dual connection system enables the model to exploit correlations among different datasets while ensuring that each level makes an additive correction to the previous level without altering it. This approach prevents performance degradation as new input data are integrated into the model and automatically adapts predictions based on the available inputs. We demonstrate the effectiveness of the approach on numerical benchmarks and a real-world case study, showing that it reliably integrates multi-modal data and provides accurate predictions, maintaining performance when generalizing across time and parameter variations.
翻译:从数值或物理实验中获取高精度数据集通常成本高昂且耗时,这对需要精确评估(可能涉及多种场景并需实时处理)的应用构成了重大挑战。即使构建足够精确的代理模型,在有限的高保真度数据下也极具难度。相比之下,成本较低的低保真度数据更容易计算,且能覆盖更广泛的场景。通过利用多保真度信息,可以提升代理模型的预测能力。然而在实际应用中,数据可能类型各异、来源模态不同且无法同时获取,这进一步增加了建模的复杂性。为应对这些挑战,我们提出了一种渐进式多保真度代理模型。该模型能通过定制化编码器逐步整合多种数据类型,并利用神经网络将编码后的输入映射至目标物理量。输入信息通过两组连接结构从低保真度向高保真度层级渐进传递:一是所有编码输入间的级联连接,二是最终输出间的加性连接。这种双重连接体系使模型能够挖掘不同数据集间的相关性,同时确保每个层级对前一层级进行加性修正而不改变其原有结构。该方法在整合新输入数据时避免了性能退化,并能根据可用输入自动调整预测。我们在数值基准测试和实际案例研究中验证了该方法的有效性,结果表明其能可靠整合多模态数据并提供精确预测,在跨时间与参数变化的外推场景中仍保持稳定性能。