While $\ell_2$ regularization is widely used in training gradient boosted trees, popular individualized feature attribution methods for trees such as Saabas and TreeSHAP overlook the training procedure. We propose Prediction Decomposition Attribution (PreDecomp), a novel individualized feature attribution for gradient boosted trees when they are trained with $\ell_2$ regularization. Theoretical analysis shows that the inner product between PreDecomp and labels on in-sample data is essentially the total gain of a tree, and that it can faithfully recover additive models in the population case when features are independent. Inspired by the connection between PreDecomp and total gain, we also propose TreeInner, a family of debiased global feature attributions defined in terms of the inner product between any individualized feature attribution and labels on out-sample data for each tree. Numerical experiments on a simulated dataset and a genomic ChIP dataset show that TreeInner has state-of-the-art feature selection performance. Code reproducing experiments is available at https://github.com/nalzok/TreeInner .
翻译:虽然在培训梯度增生树时广泛使用$@ell_2美元正规化,但Saabas和TreatsSHAP等树木流行的个性化特性归属方法忽略了培训程序。我们提出了预测分解特性(PreDecompre),这是在对梯度增生树进行培训时对梯度增生树的一种新颖的个性特征属性属性属性(PreDecompre),这是在每棵树接受$@ell_2美元正规化培训时对梯度增生树的一种新颖的个性化特性属性。理论分析表明,PreDecomp和在样本中数据标签之间的内在产品基本上是一棵树的总收益,在特征独立时,它可以忠实地在人口案例中恢复添加模型。在PreDecomp与总收益之间联系的启发下,我们还提议了TreInner,这是一个在任何个个个性化特性属性属性属性属性属性属性属性属性属性属性和每棵树外数据标签上界定的不偏差全球特征属性属性属性属性属性属性属性属性属性属性属性属性属性的家族。关于模拟数据集实验和基因化学化学化学特征分类数据集的数值实验显示,TreInner具有最新特征特征选择性特征特征选择性能的功能的功能的性能性能。可见http://gs://giths://giths://github.com/nalzozok/t/TreInner。