Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system and whether to trust it for high-stakes decisions or large-scale deployment. Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches. Additive models use heuristically sampled perturbations to learn instance-specific explainers sequentially. The process is thus inefficient and susceptible to poorly-conditioned samples. Meanwhile, instance-wise techniques directly learn local sampling distributions and can leverage global information from other inputs. However, they can only interpret single-class predictions and suffer from inconsistency across different settings, due to a strict reliance on a pre-defined number of features selected. This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes. We also propose an adaptive inference strategy to determine the optimal number of features for a specific instance. Our model explainer significantly outperforms additive and instance-wise counterparts on faithfulness while achieves high level of brevity on various data sets and black-box model architectures.
翻译:可解释的机器学习有助于深入了解哪些因素促使对黑盒系统作出某种预测,以及是否相信黑盒系统用于作出高度决策或大规模部署。现有方法主要侧重于选择解释性输入特征,这些特征遵循的是本地添加或实例方法。添加模型使用超自然抽样的扰动,按顺序学习具体实例的解释者。因此,这一过程效率低下,容易出现条件差的样本。与此同时,实例技术直接学习当地抽样分布,并能够从其他投入中利用全球信息。然而,它们只能解释单级预测,并因严格依赖预设数量的选定特征而在不同环境中出现不一致。这项工作利用了这两种方法的优势,并为同时学习多个目标类的本地解释提出了一个全球框架。我们还提议一个适应性推理战略,以确定特定实例的最佳特征数目。我们的模型解释器明显地超越了格式的添加度和实例化对应方,同时在各种数据集和黑盒模型结构上实现了高度的简洁性。