Gradient Boosted Decision Trees (GBDTs) are dominant machine learning algorithms for modeling discrete or tabular data. Unlike neural networks with millions of trainable parameters, GBDTs optimize loss function in an additive manner and have a single trainable parameter per leaf, which makes it easy to apply high-order optimization of the loss function. In this paper, we introduce high-order optimization for GBDTs based on numerical optimization theory which allows us to construct trees based on high-order derivatives of a given loss function. In the experiments, we show that high-order optimization has faster per-iteration convergence that leads to reduced running time. Our solution can be easily parallelized and run on GPUs with little overhead on the code. Finally, we discuss future potential improvements such as automatic differentiation of arbitrary loss function and combination of GBDTs with neural networks.
翻译:QDTs 是模拟离散数据或表格数据的主要机器学习算法。 与具有数百万可训练参数的神经网络不同,GBDTs以添加方式优化损失功能,每个叶叶都有单一的可训练参数,这便于对损失函数应用高端优化。 在本文中,我们引入基于数字优化理论的GBDTs 高端优化,允许我们根据特定损失函数的高阶衍生物构建树木。 在实验中,我们显示,高阶优化能更快的一线趋同,导致运行时间缩短。 我们的解决方案可以很容易地平行并运行在GPUS上, 代码上几乎没有高管。 最后,我们讨论了未来可能的改进,比如任意损失功能的自动区分以及GBDTs与神经网络的组合。