For many practical, high-risk applications, it is essential to quantify uncertainty in a model's predictions to avoid costly mistakes. While predictive uncertainty is widely studied for neural networks, the topic seems to be under-explored for models based on gradient boosting. However, gradient boosting often achieves state-of-the-art results on tabular data. This work examines a probabilistic ensemble-based framework for deriving uncertainty estimates in the predictions of gradient boosting classification and regression models. We conducted experiments on a range of synthetic and real datasets and investigated the applicability of ensemble approaches to gradient boosting models that are themselves ensembles of decision trees. Our analysis shows that ensembles of gradient boosting models successfully detect anomalous inputs while having limited ability to improve the predicted total uncertainty. Importantly, we also propose a concept of a virtual ensemble to get the benefits of an ensemble via only one gradient boosting model, which significantly reduces complexity.
翻译:对于许多实际的高风险应用,必须量化模型预测中的不确定性,以避免代价高昂的错误。虽然对神经网络进行了广泛的预测性不确定性研究,但对于基于梯度推升的模型来说,这一专题似乎没有得到充分探讨。然而,梯度推升往往在列表数据上达到最先进的结果。这项工作审查了在梯度推升分类和回归模型预测中得出不确定性估计的概率共性框架。我们就一系列合成和真实数据集进行了实验,并调查了对梯度推动模型的共通方法的适用性,这些模型本身就是决策树的集合。我们的分析表明,梯度推动模型的集合成功地探测了异常的投入,而改进预测的全不确定性的能力却很有限。重要的是,我们还提出了一个虚拟组合概念,即通过一个梯度推升模型来获得共性加速模型的好处,这大大降低了复杂性。