Decision trees algorithms use a gain function to select the best split during the tree's induction. This function is crucial to obtain trees with high predictive accuracy. Some gain functions can suffer from a bias when it compares splits of different arities. Quinlan proposed a gain ratio in C4.5's information gain function to fix this bias. In this paper, we present an updated version of the gain ratio that performs better as it tries to fix the gain ratio's bias for unbalanced trees and some splits with low predictive interest.
翻译:决策树的算法使用增益函数来选择树进化期间最佳的分化。 这个函数对于获取预测准确度高的树木至关重要。 一些增益函数在比较不同地区分化时可能会有偏差。 Quinlan 提议在 C4.5 信息增益函数中设定增益比率以修正这一偏差。 本文中我们展示了增益比率的最新版本, 该增益比率在试图修正增益比率对不平衡树木的偏差和一些预测兴趣低的分差时表现得更好 。