Tree ensembles are versatile supervised learning algorithms that achieve state-of-the-art performance. These models are extremely powerful but can grow to enormous sizes. As a result, tree ensembles are often post-processed to reduce memory footprint and improve interpretability. In this paper, we present ForestPrune, a novel optimization framework that can post-process tree ensembles by pruning depth layers from individual trees. We also develop a new block coordinate descent method to efficiently obtain high-quality solutions to optimization problems under this framework. The number of nodes in a decision tree increases exponentially with tree depth, so pruning deep trees can drastically improve model parsimony. ForestPrune can substantially reduce the space complexity of an ensemble for a minimal cost to performance. The framework supports various weighting schemes and contains just a single hyperparameter to tune. In our experiments, we observe that ForestPrune can reduce model size 20-fold with negligible performance loss.
翻译:树群是多功能、 有监督监督的学习算法, 能够达到最先进的性能。 这些模型非常强大, 但可以发展到巨大的大小。 因此, 树群往往是在后处理, 以减少记忆足迹, 并改进可解释性。 在本文中, 我们展示了ForestPrune, 这个新颖的优化框架, 可以通过从单个树中截取深度层进行加工后树群聚集。 我们还开发了一个新的块状协调下降方法, 以有效获得在这个框架下优化问题的高质量解决方案。 决策树的节点数量随着树的深度而成倍增长, 因此根植的深树可以大幅改善模型皮质 。 ForestPrune可以大幅降低剧团的空间复杂性, 以最小的成本执行。 框架支持各种加权方案, 并包含一个只需调整的单个超参数。 在我们的实验中, 我们观察到 ForPrune可以将模型的尺寸降低20倍, 并且可以忽略微小的性能损失 。