可解释的机器学习:定义、方法和应用 (Interpretable machine learning: definitions, methods, and applications)

Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related, and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the Predictive, Descriptive, Relevant (PDR) framework for discussing interpretations. The PDR framework provides three overarching desiderata for evaluation: predictive accuracy, descriptive accuracy and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post-hoc categories, with sub-groups including sparsity, modularity and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often under-appreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods.

翻译：机械学习模式在学习复杂模式方面证明取得了巨大成功,这些模式使得它们能够对未观测的数据作出预测。除了使用预测模型外,解释模型所学到的东西的能力正在受到越来越多的关注。然而,这种日益突出的焦点导致解释概念的极大混乱,特别是,不清楚提议的各种解释方法如何相互关联,以及可以使用哪些共同概念来评价这些解释方法。我们的目标是通过在机器学习的背景下界定解释性来解决这些问题,并采用预测性、描述性、相关性(PDR)框架来讨论解释问题。除了使用预测性、描述性、相关性(PDR)框架外,PDR框架为评估提供了三种总体的替代条件:预测性准确性、描述性准确性和相关性,与人类受众相比具有相关性。此外,为了帮助管理解释方法的模糊性,我们将现有技术分类为基于模型的类别和后合体类别,包括宽度、模块性和可模缩性。为了表明实践者如何利用PDR框架来评估和理解解释解释,我们提供了许多真实世界的例子。这些例子突出表明了评估的三种总体评估性:预测性、描述性准确性和相关性,我们最后将讨论基于常规性的工作限制,我们目前的工作框架,从而讨论基于人类理解的方法将产生的可能性范围。