Many applications of data-driven models demand transparency of decisions, especially in health care, criminal justice, and other high-stakes environments. Modern trends in machine learning research have led to algorithms that are increasingly intricate to the degree that they are considered to be black boxes. In an effort to reduce the opacity of decisions, methods have been proposed to construe the inner workings of such models in a human-comprehensible manner. These post hoc techniques are described as being universal explainers - capable of faithfully augmenting decisions with algorithmic insight. Unfortunately, there is little agreement about what constitutes a "good" explanation. Moreover, current methods of explanation evaluation are derived from either subjective or proxy means. In this work, we propose a framework for the evaluation of post hoc explainers on ground truth that is directly derived from the additive structure of a model. We demonstrate the efficacy of the framework in understanding explainers by evaluating popular explainers on thousands of synthetic and several real-world tasks. The framework unveils that explanations may be accurate but misattribute the importance of individual features.
翻译:数据驱动模型的许多应用要求决策的透明度,特别是在保健、刑事司法和其他高取量环境中。现代的机器学习研究趋势已导致算法日益复杂,其程度已变得日益复杂,以致被视为黑盒。为了减少决策的不透明性,已提议了一些方法,以人类理解的方式解释这类模型的内部运作方式。这些后期技术被描述为通用解释者,能够忠实地用算法的洞察力增加决策。不幸的是,对于何为“良好”解释,几乎没有一致的看法。此外,目前的解释评价方法来自主观或代理手段。在这项工作中,我们提议了一个直接从模型的添加结构中衍生出来的实地真相后特设解释者评价框架。我们通过对数千项合成任务和数项现实世界任务的民众解释者进行评价,展示了框架在理解解释者方面的功效。框架揭示了解释可能是准确的,但误认为个别特征的重要性。