Machine learning models are widely used to solve real-world problems in science and industry. To build robust models, we should quantify the uncertainty of the model's predictions on new data. This study proposes a new method for uncertainty estimation based on the surrogate Gaussian process model. Our method can equip any base model with an accurate uncertainty estimate produced by a separate surrogate. Compared to other approaches, the estimate remains computationally effective with training only one additional model and doesn't rely on data-specific assumptions. The only requirement is the availability of the base model as a black box, which is typical. Experiments for challenging time-series forecasting data show that surrogate model-based methods provide more accurate confidence intervals than bootstrap-based methods in both medium and small-data regimes and different families of base models, including linear regression, ARIMA, and gradient boosting.
翻译:机器学习模型被广泛用于解决科学和工业中现实世界的问题。 为了建立强大的模型, 我们应该量化模型预测中新数据的不确定性。 本研究提出了基于代用高斯进程模型的新的不确定性估算方法。 我们的方法可以为任何基模型配备由单独的代用模型产生的准确的不确定性估算。 与其他方法相比, 估计数在计算上仍然有效, 只培训了另外一种模型, 不依赖特定数据假设。 唯一的要求是将基模型作为黑盒, 这是典型的。 具有挑战性的时间序列预测数据的实验显示, 以模型为基础的方法在中、小数据系统和不同基模型系列(包括线性回归、 ARIMA 和梯度加速)中比以靴式为基础的方法提供更准确的信任间隔。