Scientists frequently prioritize learning from data rather than training the best possible model; however, research in machine learning often prioritizes the latter. Marginal feature importance methods, such as marginal contribution feature importance (MCI), attempt to break this trend by providing a useful framework for quantifying the relationships in data in an interpretable fashion. In this work, we generalize the framework of MCI while aiming to improve performance and runtime by introducing ultra-marginal feature importance (UMFI). To do so, we prove that UMFI can be computed directly by applying preprocessing methods from the AI fairness literature to remove dependencies in the feature set. We show on real and simulated data that UMFI performs at least as well as MCI, with significantly better performance in the presence of correlated interactions and unrelated features, while substantially reducing the exponential runtime of MCI to super-linear.
翻译:科学家们经常优先考虑从数据中学习,而不是培训最佳模式;然而,对机器学习的研究往往优先考虑后者。边际特征重要方法,例如边际贡献重要性(MCI),试图打破这一趋势,为以可解释的方式量化数据关系提供一个有用的框架。在这项工作中,我们推广MCI的框架,同时通过引入超边际特征重要性(UMIF)来改善业绩和运行时间。为了做到这一点,我们证明可以通过应用AI公平文献中的预处理方法直接计算UMIF,以消除功能集中的依赖性。我们用真实和模拟的数据显示,UMIF至少和MCI都表现得更好,在出现相关互动和不相关特征时,同时将MCI的指数运行时间大幅降低到超线性。