PredDiff is a model-agnostic, local attribution method that is firmly rooted in probability theory. Its simple intuition is to measure prediction changes while marginalizing features. In this work, we clarify properties of PredDiff and its connection to Shapley values. We stress important differences between classification and regression, which require a specific treatment within both formalisms. We extend PredDiff by introducing a new, well-founded measure for interaction effects between arbitrary feature subsets. The study of interaction effects represents an inevitable step towards a comprehensive understanding of black-box models and is particularly important for science applications. As opposed to Shapley values, our novel measure maintains the original linear scaling and is thus generally applicable to real-world problems.
翻译:PredDiff是一种在概率理论中牢固扎根的模型-不可知和本地归属方法。 它的简单直觉是测量预测变化,同时将特征边缘化。 在这项工作中,我们澄清了PredDiff的特性及其与Shapley值的联系。 我们强调分类和回归之间的重要差异,这需要在两种形式主义中进行具体处理。 我们通过对任意特性子集之间的相互作用效果采用新的、有充分根据的衡量标准来扩展PredDiff。 互动效应研究是全面理解黑盒模型的一个不可避免的步骤,对于科学应用尤其重要。 与Shapley值相比,我们的新措施维持了原始线性尺度,因此普遍适用于现实世界问题。