Random forests have been widely used for their ability to provide so-called importance measures, which give insight at a global (per dataset) level on the relevance of input variables to predict a certain output. On the other hand, methods based on Shapley values have been introduced to refine the analysis of feature relevance in tree-based models to a local (per instance) level. In this context, we first show that the global Mean Decrease of Impurity (MDI) variable importance scores correspond to Shapley values under some conditions. Then, we derive a local MDI importance measure of variable relevance, which has a very natural connection with the global MDI measure and can be related to a new notion of local feature relevance. We further link local MDI importances with Shapley values and discuss them in the light of related measures from the literature. The measures are illustrated through experiments on several classification and regression problems.
翻译:随机森林被广泛用于其提供所谓重要措施的能力,从而在全球(每个数据集)一级深入了解输入变量对于预测某一产出的相关性;另一方面,采用了基于沙普利值的方法,以完善对基于树的模型特征相关性的分析,将其推向当地(每个实例)一级;在这方面,我们首先表明,在某种条件下,全球低质平均值(MDI)不同重要性分数与沙普利值相对应;然后,我们得出一个具有可变相关性的本地计量吸入器重要尺度,该尺度与全球计量吸入器计量具有非常自然的联系,并可能与新的当地特征相关性概念有关;我们进一步将本地计量吸入器重要性与沙普利值联系起来,并根据文献的相关计量加以讨论;这些措施通过若干分类和回归问题的实验加以说明。