In this paper, we take a human-centered approach to interpretable machine learning. First, drawing inspiration from the study of explanation in philosophy, cognitive science, and the social sciences, we propose a list of design principles for machine-generated explanations that are meaningful to humans. Using the concept of weight of evidence from information theory, we develop a method for producing explanations that adhere to these principles. We show that this method can be adapted to handle high-dimensional, multi-class settings, yielding a flexible meta-algorithm for generating explanations. We demonstrate that these explanations can be estimated accurately from finite samples and are robust to small perturbations of the inputs. We also evaluate our method through a qualitative user study with machine learning practitioners, where we observe that the resulting explanations are usable despite some participants struggling with background concepts like prior class probabilities. Finally, we conclude by surfacing design implications for interpretability tools
翻译:在本文中,我们对可解释的机器学习采取以人为中心的方法。首先,从哲学、认知学和社会科学的解释研究中得到启发,我们提出了对人有意义的机器产生的解释设计原则清单。我们利用信息理论中证据的权重概念,制定了符合这些原则的解释方法。我们表明,这一方法可以适应高维、多级环境,产生灵活的元等级解释法。我们证明,这些解释可以精确地从有限的样本中估算出来,并且能够对投入进行小的扰动。我们还通过与机器学习实践者进行质量用户研究来评估我们的方法。我们发现,尽管一些参与者在与前阶级概率等背景概念斗争中挣扎,但由此产生的解释还是有用的。最后,我们通过对可解释工具的设计影响进行冲浪来得出结论。