Based on decision trees, many fields have arguably made tremendous progress in recent years. In simple words, decision trees use the strategy of "divide-and-conquer" to divide the complex problem on the dependency between input features and labels into smaller ones. While decision trees have a long history, recent advances have greatly improved their performance in computational advertising, recommender system, information retrieval, etc. We introduce common tree-based models (e.g., Bayesian CART, Bayesian regression splines) and training techniques (e.g., mixed integer programming, alternating optimization, gradient descent). Along the way, we highlight probabilistic characteristics of tree-based models and explain their practical and theoretical benefits. Except machine learning and data mining, we try to show theoretical advances on tree-based models from other fields such as statistics and operation research. We list the reproducible resource at the end of each method.
翻译:在决策树的基础上,许多领域近年来取得了巨大的进步。简言之,决策树采用“分解和征服”战略,将关于投入特征和标签之间依赖性的复杂问题分为小领域。虽然决策树历史悠久,但最近的进展大大改善了它们在计算广告、建议系统、信息检索等方面的表现。我们引入了共同的树基模型(例如巴耶西亚CART、巴耶西亚回归样板)和培训技术(例如混合整数编程、交替优化、梯度下降 ) 。与此同时,我们突出植树模型的概率特征,并解释其实际和理论效益。除了机器学习和数据挖掘之外,我们试图展示统计和作业研究等其他领域基于树基模型的理论进步。我们列出了每种方法结尾的可复制资源。