Scientists frequently prioritize learning from data rather than training the best possible model; however, research in machine learning often prioritizes the latter. Marginal feature importance methods, such as marginal contribution feature importance (MCI), attempt to break this trend by providing a useful framework for quantifying the relationships in data in an interpretable fashion. In this work, we aim to improve upon the theoretical properties, performance, and runtime of MCI by introducing ultra-marginal feature importance (UMFI), which uses preprocessing methods from the AI fairness literature to remove dependencies in the feature set prior to model evaluation. We show on real and simulated data that UMFI performs at least as well as MCI, with significantly better performance in the presence of correlated interactions and unrelated features, while partially learning the structure of the causal graph and substantially reducing the exponential runtime of MCI to super-linear.
翻译:科学家们经常优先考虑从数据中学习,而不是培训最佳模式;然而,对机器学习的研究往往优先考虑后者。 边际特征重要方法,如边际贡献重要性(MCI),试图打破这一趋势,为以可解释的方式量化数据关系提供一个有用的框架。 在这项工作中,我们的目标是通过引入超边际特征重要性(UMIF)来改进MCI的理论属性、性能和运行时间,该特征使用从AI公平文献中提取的预处理方法来消除模型评估前特征集的依赖性。 我们用真实和模拟数据展示UMIF至少和MCI的表现,在出现相关互动和互不相关特征时,业绩显著改善,同时部分学习因果图的结构,将MCI的指数运行时间大幅缩短到超线性。