For many practical, high-risk applications, it is essential to quantify uncertainty in a model's predictions to avoid costly mistakes. While predictive uncertainty is widely studied for neural networks, the topic seems to be under-explored for models based on gradient boosting. However, gradient boosting often achieves state-of-the-art results on tabular data. This work examines a probabilistic ensemble-based framework for deriving uncertainty estimates in the predictions of gradient boosting classification and regression models. We conducted experiments on a range of synthetic and real datasets and investigated the applicability of ensemble approaches to gradient boosting models that are themselves ensembles of decision trees. Our analysis shows that ensembles of gradient boosting models successfully detect anomaly inputs while having limited ability to improve the predicted total uncertainty. Importantly, we also propose a concept of a \emph{virtual} ensemble to get the benefits of an ensemble via only \emph{one} gradient boosting model, which significantly reduces complexity.
翻译:对于许多实际的高风险应用,必须量化模型预测中的不确定性,以避免代价高昂的错误。虽然对神经网络进行了广泛的预测性不确定性研究,但对于基于梯度推升的模型来说,这一专题似乎探索不足。然而,梯度推升往往在表格数据上达到最新的结果。这项工作审查了一种基于共性的框架,用以在梯度推升分类和回归模型预测中得出不确定性的估计。我们进行了一系列合成和真实数据集的实验,并调查了对梯度推动模型的共通方法的适用性,这些模型本身就包含决策树。我们的分析表明,梯度推动模型的集合成功地检测了异常输入,而改进预测的全不确定性的能力却有限。重要的是,我们还提出了一个基于共性共性的概念,以便获得仅通过emph{one}梯度推升模型获得的共性惠益,这大大降低了复杂性。