Least squares regression is a ubiquitous tool for building emulators (a.k.a. surrogate models) of problems across science and engineering for purposes such as design space exploration and uncertainty quantification. When the regression data are generated using an experimental design process (e.g., a quadrature grid) involving computationally expensive models, or when the data size is large, sketching techniques have shown promise to reduce the cost of the construction of the regression model while ensuring accuracy comparable to that of the full data. However, random sketching strategies, such as those based on leverage scores, lead to regression errors that are random and may exhibit large variability. To mitigate this issue, we present a novel boosting approach that leverages cheaper, lower-fidelity data of the problem at hand to identify the best sketch among a set of candidate sketches. This in turn specifies the sketch of the intended high-fidelity model and the associated data. We provide theoretical analyses of this bi-fidelity boosting (BFB) approach and discuss the conditions the low- and high-fidelity data must satisfy for a successful boosting. In doing so, we derive a bound on the residual norm of the BFB sketched solution relating it to its ideal, but computationally expensive, high-fidelity boosted counterpart. Empirical results on both manufactured and PDE data corroborate the theoretical analyses and illustrate the efficacy of the BFB solution in reducing the regression error, as compared to the non-boosted solution.
翻译:最小平方回归是一个无处不在的工具,用于构建各种科学和工程问题的模拟器(a.k.a.a.代理模型),用于设计空间探索和不确定性量化等目的。当回归数据使用计算成本昂贵模型的实验设计过程(例如二次曲线网)生成时,或者当数据规模大时,草图技术显示有可能降低构建回归模型的成本,同时确保与完整数据的准确性相仿。然而,随机绘制图象战略,例如基于杠杆分数的图象,会导致随机的回归错误,并可能表现出很大的变异性。为了缓解这一问题,我们提出了一个创新的推进方法,即利用当前问题的更便宜、更低纤维性的数据,以确定一组候选草图的最佳草图。这反过来又说明了预期的高纤维模型和相关数据的草图。我们对这种双纤维加速推进(B)方法进行理论分析,并讨论低纤维值和高纤维值数据必须满足的条件,以便成功推进B型的解决方案。我们用新的推算了B型模型的模型,同时将B型模型的推算结果与B型模型的推算结果联系起来。