大型模拟实验的当地诱导高斯过程 (Locally induced Gaussian processes for large-scale simulation experiments)

Gaussian processes (GPs) serve as flexible surrogates for complex surfaces, but buckle under the cubic cost of matrix decompositions with big training data sizes. Geospatial and machine learning communities suggest pseudo-inputs, or inducing points, as one strategy to obtain an approximation easing that computational burden. However, we show how placement of inducing points and their multitude can be thwarted by pathologies, especially in large-scale dynamic response surface modeling tasks. As remedy, we suggest porting the inducing point idea, which is usually applied globally, over to a more local context where selection is both easier and faster. In this way, our proposed methodology hybridizes global inducing point and data subset-based local GP approximation. A cascade of strategies for planning the selection of local inducing points is provided, and comparisons are drawn to related methodology with emphasis on computer surrogate modeling applications. We show that local inducing points extend their global and data-subset component parts on the accuracy--computational efficiency frontier. Illustrative examples are provided on benchmark data and a large-scale real-simulation satellite drag interpolation problem.

翻译：Gausian 进程(GPs)作为复杂表面的灵活代孕器,但扣在使用大培训数据大小的矩阵分解的立方成本下。地理空间和机器学习社区建议假投入或诱导点,作为获得近似缓解计算负担的一种战略。然而,我们表明诱导点及其众多点的放置会如何受到病理学的阻碍,特别是在大规模动态反应表面建模任务中。作为补救,我们建议将通常在全球应用的引点想法移植到一个比较容易和更快的本地环境。这样,我们提议的方法将全球诱导点和基于数据子集的当地GP近似值混合起来。提供了一套规划选择当地引点的战略,并比较了强调计算机代金模型应用的相关方法。我们显示,当地引点扩大了其全球和数据次集部分,用于精确-计算效率前沿。我们提供了关于基准数据和大规模真实卫星拖动内插问题的示例。