Scientists frequently prioritize learning from data rather than training the best possible model; however, research in machine learning often prioritizes the latter. Marginal contribution feature importance (MCI) was developed to break this trend by providing a useful framework for quantifying the relationships in data. In this work, we aim to improve upon the theoretical properties, performance, and runtime of MCI by introducing ultra-marginal feature importance (UMFI), which uses dependence removal techniques from the AI fairness literature as its foundation. We first propose axioms for feature importance methods that seek to explain the causal and associative relationships in data, and we prove that UMFI satisfies these axioms under basic assumptions. We then show on real and simulated data that UMFI performs better than MCI, especially in the presence of correlated interactions and unrelated features, while partially learning the structure of the causal graph and reducing the exponential runtime of MCI to super-linear.
翻译:科学家经常优先考虑从数据中学习,而不是培训最佳模式;然而,对机器学习的研究往往优先考虑后者。为打破这一趋势,开发了边际贡献的重要性(MCI),为量化数据关系提供了一个有用的框架。在这项工作中,我们的目标是通过引入超边际特征重要性(UMIF)来改进MCI的理论属性、性能和运行时间,超边际特征重要性(UMIF)使用从AI公平文献中去除依赖性的技术作为其基础。我们首先提出特征重要性的轴心,以寻求解释数据中的因果关系和关联关系,我们证明UMIF在基本假设下满足了这些轴心。我们然后在真实和模拟数据中显示,UMIFI的表现优于MCI,特别是在存在相关互动和不相关特征的情况下,同时部分学习了因果图的结构,并将MCI的指数运行时间减少到超级线性。