Many methods to explain black-box models, whether local or global, are additive. In this paper, we study global additive explanations for non-additive models, focusing on four explanation methods: partial dependence, Shapley explanations adapted to a global setting, distilled additive explanations, and gradient-based explanations. We show that different explanation methods characterize non-additive components in a black-box model's prediction function in different ways. We use the concepts of main and total effects to anchor additive explanations, and quantitatively evaluate additive and non-additive explanations. Even though distilled explanations are generally the most accurate additive explanations, non-additive explanations such as tree explanations that explicitly model non-additive components tend to be even more accurate. Despite this, our user study showed that machine learning practitioners were better able to leverage additive explanations for various tasks. These considerations should be taken into account when considering which explanation to trust and use to explain black-box models.
翻译:解释黑盒模型的许多方法, 不管是本地还是全球的, 都是一种添加。 在本文中, 我们研究非添加型模型的全球性添加剂解释, 侧重于四种解释方法: 部分依赖性、 适合全球背景的沙皮解释、 蒸馏添加剂解释和梯度解释。 我们显示, 黑盒模型的预测函数中, 不同的解释方法具有非添加成分的特点 。 我们用主要和总影响的概念来锁定添加剂解释, 并量化地评估添加剂和非添加剂解释。 尽管蒸馏式解释通常是最准确的添加剂解释, 非添加剂解释, 如明确模拟非添加剂组成部分的树解释往往甚至更准确。 尽管如此, 我们用户研究表明, 机器学习实践者更有能力为各种任务利用添加剂解释。 在考虑解释信任和使用哪种解释来解释黑盒模型时, 应该考虑这些因素 。