Shapley values are model-agnostic methods for explaining model predictions. Many commonly used methods of computing Shapley values, known as off-manifold methods, rely on model evaluations on out-of-distribution input samples. Consequently, explanations obtained are sensitive to model behaviour outside the data distribution, which may be irrelevant for all practical purposes. While on-manifold methods have been proposed which do not suffer from this problem, we show that such methods are overly dependent on the input data distribution, and therefore result in unintuitive and misleading explanations. To circumvent these problems, we propose ManifoldShap, which respects the model's domain of validity by restricting model evaluations to the data manifold. We show, theoretically and empirically, that ManifoldShap is robust to off-manifold perturbations of the model and leads to more accurate and intuitive explanations than existing state-of-the-art Shapley methods.
翻译:光谱值是解释模型预测的模型-不可知性方法。许多常用的计算光谱值的方法(称为非非非非非非非非非性方法)依赖于对分配外输入样本的模型评价。因此,所获得的解释对数据分布以外的模型行为十分敏感,而对于所有实际目的而言,这些解释都可能无关紧要。虽然已经提出了不因这一问题而受到影响的自制方法,但我们表明,这些方法过分依赖输入数据分布,从而导致不直觉和误导性的解释。为回避这些问题,我们提议ManifoldShap,通过将模型评价限制在数据组合中来尊重模型的有效性领域。我们从理论上和从经验上表明,ManifoldShap在理论上和实践中都强于模型的非自制性扰动,并导致比现有最先进的光学方法更准确和直觉的解释。</s>