The black-box nature of modern machine learning techniques invokes a practical and ethical need for explainability. Feature importance aims to meet this need by assigning scores to features, so humans can understand their influence on predictions. Feature importance can be used to explain predictions under different settings: of the entire sample space or a specific instance; of model behavior, or the dependencies in the data themselves. However, in most cases thus far, each of these settings was studied in isolation. We attempt to develop a sound feature importance score framework by defining a small set of desired properties. Surprisingly, we prove an inconsistency theorem, showing that the expected properties cannot hold simultaneously. To overcome this difficulty, we propose the novel notion of re-partitioning the feature space into separable sets. Such sets are constructed to contain features that exhibit inter-set independence with respect to the target variable. We show that there exists a unique maximal partitioning into separable sets. Moreover, assigning scores to separable sets, instead of single features, unifies the results of commonly used feature importance scores and annihilates the inconsistencies we demonstrated.
翻译:现代机器学习技术的黑匣子性质引出了一种实际和伦理的解释性需要。 特性重要性的目的是通过对特征进行评分来满足这一需要, 以便人类能够理解其对预测的影响。 特性重要性可以用来解释不同环境中的预测: 整个样本空间或特定实例; 模型行为, 或数据本身的依存性。 但是, 在多数情况下, 这些设置都是孤立地研究的。 我们试图通过定义一小套想要的属性来开发一个健全的特征重要性评分框架。 令人惊讶的是, 我们证明存在不一致的标语, 表明预期的属性无法同时存在。 为了克服这一困难, 我们提出了将特性空间重新分割成可分离的组合的新概念。 这些组合的构造含有显示目标变量之间独立性的特征。 我们表明,在大多数情况下,这些设置都有一个独特的最大最大分隔装置。 此外, 我们给可分数的组合分配分数, 而不是单一的特性, 将常用特征重要分数的结果统一起来, 并消除我们所展示的不一致之处。