Using the framework of boosting, we prove that all impurity-based decision tree learning algorithms, including the classic ID3, C4.5, and CART, are highly noise tolerant. Our guarantees hold under the strongest noise model of nasty noise, and we provide near-matching upper and lower bounds on the allowable noise rate. We further show that these algorithms, which are simple and have long been central to everyday machine learning, enjoy provable guarantees in the noisy setting that are unmatched by existing algorithms in the theoretical literature on decision tree learning. Taken together, our results add to an ongoing line of research that seeks to place the empirical success of these practical decision tree algorithms on firm theoretical footing.
翻译:使用提振框架,我们证明所有基于杂质的决定树学习算法,包括经典的ID3、C4.5和CART,都是高度噪音容忍的。 我们的保证在最强烈的噪音模式下维持着,我们在允许的噪音率上提供了接近匹配的上下界限。 我们还进一步表明,这些简单且长期以来一直是日常机器学习核心的算法,在噪音环境中享有可证实的保证,而这种保证与关于决定树学习的理论文献中的现有算法是无法比拟的。加在一起,我们的结果增加了正在进行的研究范围,力求将这些实际决定树算法的经验成功建立在坚实的理论基础上。