PredDiff is a model-agnostic, local attribution method that is firmly rooted in probability theory. Its simple intuition is to measure prediction changes while marginalizing features. In this work, we clarify properties of PredDiff and its close connection to Shapley values. We stress important differences between classification and regression, which require a specific treatment within both formalisms. We extend PredDiff by introducing a new, well-founded measure for interaction effects between arbitrary feature subsets. The study of interaction effects represents an inevitable step towards a comprehensive understanding of black-box models and is particularly important for science applications. Equipped with our novel interaction measure, PredDiff is a promising model-agnostic approach for obtaining reliable, numerically inexpensive and theoretically sound attributions.
翻译:PredDiff是一种在概率理论中牢固扎根的模型-不可知和本地归属方法。 它的简单直觉是测量预测变化,同时将特征边缘化。 在这项工作中,我们澄清了PredDiff的特性及其与Shapley值的密切联系。 我们强调分类和回归之间的重要差异,这需要在两种形式主义中给予特殊处理。 我们通过引入新的、有充分根据的测量任意特性子集之间互动效应的措施扩展PredDiff。 互动效应研究是全面理解黑盒模型的不可避免的一步,对于科学应用尤其重要。 利用我们的新互动措施,PredDiff是一种很有希望的模型-不可知性方法,以获得可靠、数字上低廉和理论上合理的属性。