Gradient Boosting Machines (GBM) are hugely popular for solving tabular data problems. However, practitioners are not only interested in point predictions, but also in probabilistic predictions in order to quantify the uncertainty of the predictions. Creating such probabilistic predictions is difficult with existing GBM-based solutions: they either require training multiple models or they become too computationally expensive to be useful for large-scale settings. We propose Probabilistic Gradient Boosting Machines (PGBM), a method to create probabilistic predictions with a single ensemble of decision trees in a computationally efficient manner. PGBM approximates the leaf weights in a decision tree as a random variable, and approximates the mean and variance of each sample in a dataset via stochastic tree ensemble update equations. These learned moments allow us to subsequently sample from a specified distribution after training. We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods: (i) PGBM enables probabilistic estimates without compromising on point performance in a single model, (ii) PGBM learns probabilistic estimates via a single model only (and without requiring multi-parameter boosting), and thereby offers a speedup of up to several orders of magnitude over existing state-of-the-art methods on large datasets, and (iii) PGBM achieves accurate probabilistic estimates in tasks with complex differentiable loss functions, such as hierarchical time series problems, where we observed up to 10\% improvement in point forecasting performance and up to 300\% improvement in probabilistic forecasting performance.
翻译:加速推进机(GBM)对于解决表格数据问题非常受欢迎。然而,实践者不仅对点预测感兴趣,而且对概率预测感兴趣,以便量化预测的不确定性。在现有基于GBM的解决方案下,很难作出这种概率预测:它们要么需要培训多种模型,要么在计算上变得太昂贵,无法用于大规模设置。我们提议了概率渐进推动机(PGBM),这是一种以计算效率方式生成一个决定树合用单级阶梯的概率预测的方法。PGBM将决定树中的叶重量作为随机变量进行近似,并且通过随机树通性更新方程式来估计数据集中每个样本的平均和差异。这些学习的瞬间让我们能够随后从培训后的特定分布中抽取样本。我们从经验上展示了PGBMM(PGBM)相对于现有状态改进方法的优势:(i)PGBM能够以计算稳性估计的概率,而不会降低一个点位数的精确性能,因此只能通过单一的模型(GBBS)中测算出一个大型的运行速度。