Interpretability provides a means for humans to verify aspects of machine learning (ML) models and empower human+ML teaming in situations where the task cannot be fully automated. Different contexts require explanations with different properties. For example, the kind of explanation required to determine if an early cardiac arrest warning system is ready to be integrated into a care setting is very different from the type of explanation required for a loan applicant to help determine the actions they might need to take to make their application successful. Unfortunately, there is a lack of standardization when it comes to properties of explanations: different papers may use the same term to mean different quantities, and different terms to mean the same quantity. This lack of a standardized terminology and categorization of the properties of ML explanations prevents us from both rigorously comparing interpretable machine learning methods and identifying what properties are needed in what contexts. In this work, we survey properties defined in interpretable machine learning papers, synthesize them based on what they actually measure, and describe the trade-offs between different formulations of these properties. In doing so, we enable more informed selection of task-appropriate formulations of explanation properties as well as standardization for future work in interpretable machine learning.
翻译:解释性为人类提供了一种手段,以核实机器学习模式的方方面面,并在任务无法完全自动化的情况下赋予人类+ML团队以权力。不同的背景要求解释不同的特性。例如,确定早期心脏停止警报系统是否准备纳入护理环境所需的解释与贷款申请人帮助确定其应用成功所需的行动所需的解释类型大不相同。不幸的是,在解释性能方面缺乏标准化:不同的文件可能使用同一术语表示不同数量,不同的术语表示相同数量。缺乏标准化术语和对ML解释特性的分类使我们无法严格比较可解释的机器学习方法并确定在何种情况下需要哪些属性。在这项工作中,我们调查可解释的机器学习文件中界定的属性,根据这些属性的实际计量加以综合,并描述这些属性的不同配方之间的权衡。在这样做时,我们可以更知情地选择适合任务的解释性公式,作为未来可解释性机器学习工作的标准化。