Statistical emulator is a surrogate model of complex physical models to drastically reduce the computational cost. Its successful implementation hinges on the accurate representation of the nonlinear response surface with a high-dimensional input space. Conventional "space-filling" designs, including random sampling and Latin hypercube sampling, become inefficient as the dimensionality of the input variables increases and are problematic in the functional space. To address this fundamental challenge, we develop a reliable emulator for predicting complex functionals by active-learning with error control (ALEC) that is applicable to infinite-dimensional mapping with high-fidelity predictions and a controlled predictive error. The computational efficiency has been demonstrated by emulating the classical density functional theory (cDFT) calculations, a statistical-mechanical method widely used in modeling the equilibrium properties of complex molecular systems. We show that the ALEC emulator is much more accurate than conventional Gaussian processes emulators based on "space-filling" designs, another widely used active learning approach, and computationally more efficient than direct cDFT calculations. The ALEC framework can be a reliable building block for emulating expensive functionals, because of its reduced computational cost, controlled predictive error, and fully automatic features.
翻译:统计模拟器是复杂物理模型的替代模型,可以大幅降低计算成本。其成功实施取决于非线性反应表面的精确表示,具有高维输入空间。常规“空间填充”设计,包括随机抽样和拉丁超立方取样,随着输入变量的维度增加而变得效率低下,在功能空间也存在问题。为了应对这一根本挑战,我们开发了一个可靠的模拟器,通过使用错误控制(ALEC)进行主动学习来预测复杂的功能,该模拟器适用于以高纤维预测和受控预测错误进行无线性绘图。计算效率表现在模拟经典密度功能理论(CDFT)计算(cDFT)时,这是一种统计机械方法,广泛用于模拟复杂分子系统的平衡特性。我们显示,ALEC模拟器比基于“空间填充”设计(ALEC)的常规高频进程模拟器更精确得多,这是另一个广泛使用的积极学习方法,而且计算效率也高于直接的CDFT计算。计算。计算效率已经通过模拟模型模拟了传统密度功能性模型,因为可降低成本的自动计算,因此可以完全进行可靠的计算。