Machine learning models that offer excellent predictive performance often lack the interpretability necessary to support integrated human machine decision-making. In clinical medicine and other high-risk settings, domain experts may be unwilling to trust model predictions without explanations. Work in explainable AI must balance competing objectives along two different axes: 1) Explanations must balance faithfulness to the model's decision-making with their plausibility to a domain expert. 2) Domain experts desire local explanations of individual predictions and global explanations of behavior in aggregate. We propose to train a proxy model that mimics the behavior of the trained model and provides fine-grained control over these trade-offs. We evaluate our approach on the task of assigning ICD codes to clinical notes to demonstrate that explanations from the proxy model are faithful and replicate the trained model behavior.
翻译:在临床医学和其他高风险环境中,领域专家可能不愿意相信模型预测而不加解释。在可解释的AI中,工作必须平衡两个不同轴心的相互竞争的目标:(1)解释必须兼顾模型决策的忠诚和对域专家的可信赖性。(2) 域专家希望对单个预测和全球行为总体解释作出当地解释。我们提议培训一种代用模型,模仿经过训练的模型的行为,并为这些交易提供精细的控制。我们评价我们指派ICD代码用于临床说明的任务,以表明代用模型的解释是忠实的,并复制经过训练的模型行为。