Tree ensembles distribute feature importance evenly amongst groups of correlated features. The average feature ranking of the correlated group is suppressed, which reduces interpretability and complicates feature selection. In this paper we present ControlBurn, a feature selection algorithm that uses a weighted LASSO-based feature selection method to prune unnecessary features from tree ensembles, just as low-intensity fire reduces overgrown vegetation. Like the linear LASSO, ControlBurn assigns all the feature importance of a correlated group of features to a single feature. Moreover, the algorithm is efficient and only requires a single training iteration to run, unlike iterative wrapper-based feature selection methods. We show that ControlBurn performs substantially better than feature selection methods with comparable computational costs on datasets with correlated features.
翻译:树群分布在相关特性组中具有同等重要性。 关联组的平均特征排序被抑制, 这会降低可解释性并使特征选择复杂化。 在本文中, 我们展示了“ 控制布鲁恩 ” ( ControlBurn), 这是一种基于 LASSO 的特性选择算法, 使用基于 LASSO 的加权特征选择方法来从树群中提取不必要的特性, 正如低强度火灾减少过度生长的植被一样。 与线性 LASSO 一样, Control 将相关特征组的所有特征都指定为单一特征组。 此外, 算法非常有效, 只需要使用单一的培训性转接操作, 不同于基于迭接包装的特征选择方法。 我们显示, “ 控制布鲁恩” 运行的特性选择方法比特征选择方法要好得多, 具有可比的计算成本, 与相关特性组的数据集。