PredDiff is a model-agnostic, local attribution method that is firmly rooted in probability theory. Its simple intuition is to measure prediction changes when marginalizing out feature variables. In this work, we clarify properties of PredDiff and put forward several extensions of the original formalism. Most notably, we introduce a new measure for interaction effects. Interactions are an inevitable step towards a comprehensive understanding of black-box models. Importantly, our framework readily allows to investigate interactions between arbitrary feature subsets and scales linearly with their number. We demonstrate the soundness of PredDiff relevances and interactions both in the classification and regression setting. To this end, we use different analytic, synthetic and real-world datasets.
翻译:PredDiff 是一种在概率理论中牢固扎根的模型-不可知和本地归属方法。 它简单的直觉是测量在排除特性变量时的预测变化。 在这项工作中, 我们澄清了PredDiff 的特性, 并提出了一些原始形式主义的延伸。 最显著的是, 我们引入了一种互动效应的新尺度。 互动是全面理解黑盒模型的不可避免的一步。 重要的是, 我们的框架随时可以调查任意特性子集和尺度与其数量线性的互动。 我们显示了 PredDiff 相关性和相互作用在分类和回归设置中的正确性。 为此, 我们使用不同的分析、合成和真实世界数据集。