通过量化的梯级后裔利用强盗反馈, 准确的学习决策树 (Learning Accurate Decision Trees with Bandit Feedback via Quantized Gradient Descent)

Decision trees provide a rich family of highly non-linear but efficient models, due to which they continue to be the go-to family of predictive models by practitioners across domains. But learning trees is a challenging problem due to their highly discrete and non-differentiable decision boundaries. The state-of-the-art techniques use greedy methods that exploit the discrete tree structure but are tailored to specific problem settings (say, categorical vs real-valued predictions). In this work, we propose a reformulation of the tree learning problem that provides better conditioned gradients, and leverages successful deep network learning techniques like overparameterization and straight-through estimators. Our reformulation admits an efficient and {\em accurate} gradient-based algorithm that allows us to deploy our solution in disparate tree learning settings like supervised batch learning and online bandit feedback based learning. Using extensive validation on standard benchmarks, we observe that in the supervised learning setting, our general method is competitive to, and in some cases more accurate than, existing methods that are designed {\em specifically} for the supervised settings. In contrast, for bandit settings, where most of the existing techniques are not applicable, our models are still accurate and significantly outperform the applicable state-of-the-art methods.

翻译：决策树提供了丰富多彩的、高度非线性但效率高的模式,由于这些模式,他们仍然是跨领域实践者预测模型的组合。但是,学习树木是一个具有挑战性的问题,因为其决定界限高度离散和无差别。最先进的技术使用贪婪的方法,利用离散的树结构,但适应特定的问题环境(如,绝对或实际价值预测)。在这项工作中,我们提议重新研究树木学习问题,提供更好的条件梯度,利用成功的深层次网络学习技术,如过度参数化和直通的估测。我们的重新拟订承认一种高效和精确的梯度算法,使我们能够在不同的树学习环境中部署我们的解决办法,如监督的批量学习和在线带状反馈,以学习为基础。我们发现,在监督的学习环境中,我们的一般方法比专门设计的现有方法更具有竞争力,在某些情况下更准确。相比之下,对于受监督的环境而言,我们的一般方法比现有方法更精确,大多数现有方法都不适用。