Gradient boosting is a prediction method that iteratively combines weak learners to produce a complex and accurate model. From an optimization point of view, the learning procedure of gradient boosting mimics a gradient descent on a functional variable. This paper proposes to build upon the proximal point algorithm, when the empirical risk to minimize is not differentiable, in order to introduce a novel boosting approach, called proximal boosting. Besides being motivated by non-differentiable optimization, the proposed algorithm benefits from algorithmic improvements such as controlling the approximation error and Nesterov's acceleration, in the same way as gradient boosting [Grubb and Bagnell, 2011, Biau et al., 2018]. This leads to two variants, respectively called residual proximal boosting and accelerated proximal boosting. Theoretical convergence is proved for the first two procedures under different hypotheses on the empirical risk and advantages of leveraging proximal methods for boosting are illustrated by numerical experiments on simulated and real-world data. In particular, we exhibit a favorable comparison over gradient boosting regarding convergence rate and prediction accuracy.
翻译:梯度递增是一种预测方法, 迭代地将弱学习者结合成一个复杂而准确的模型。 从优化的角度看, 梯度递增的学习程序在功能变量上模仿梯度递减。 本文建议, 当实验性最小化的风险无法区分时, 以近似点算法为基础, 以便引入一种新型推增方法, 称为准度递增。 除了由非差异性优化驱动之外, 拟议的算法改进, 如控制近似错误和 Nesterov 加速等, 也如同梯度推增一样 。 这导致两种变式, 分别称为剩余准振动和加速准度振动。 在关于实验性风险和运用准度推动方法的优势的不同假设下, 最初两种程序理论趋同得到了证明。 在模拟数据和现实世界数据上的数字实验可以说明, 我们展示了一种对加速度加速度的有利比较。