面对不一致问题：以解释共识为训练目标 (Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective)

As neural networks increasingly make critical decisions in high-stakes settings, monitoring and explaining their behavior in an understandable and trustworthy manner is a necessity. One commonly used type of explainer is post hoc feature attribution, a family of methods for giving each feature in an input a score corresponding to its influence on a model's output. A major limitation of this family of explainers in practice is that they can disagree on which features are more important than others. Our contribution in this paper is a method of training models with this disagreement problem in mind. We do this by introducing a Post hoc Explainer Agreement Regularization (PEAR) loss term alongside the standard term corresponding to accuracy, an additional term that measures the difference in feature attribution between a pair of explainers. We observe on three datasets that we can train a model with this loss term to improve explanation consensus on unseen data, and see improved consensus between explainers other than those used in the loss term. We examine the trade-off between improved consensus and model performance. And finally, we study the influence our method has on feature attribution explanations.

翻译：随着神经网络越来越在高风险环境中做出关键决策，在可理解和可信的方式下监控和解释其行为成为必要。一种常用的解释器类型是后续特征归因（post hoc feature attribution），是一种给输入中的每个特征分配得分与其对模型输出的影响相对应的方法。实践中这种解释器的一个主要限制是它们可能对哪些特征更重要的问题不一致。本文的贡献是提出了在考虑这种不一致问题的情况下训练模型的方法。我们通过引入一个后续解释器一致性正则化（Post hoc Explainer Agreement Regularization，PEAR）损失项与精度的标准项一起训练模型，添加了一个额外的项，用于衡量一对解释器之间的特征归因差异。在三个数据集上，我们观察到我们可以用这个损失项来训练模型，在未见数据上改善解释器共识，并且在损失项中未使用的解释器之间看到共识得到改进。我们研究了改进共识和模型性能之间的权衡。最后，我们研究了我们的方法对特征归因解释的影响。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

专知会员服务

68+阅读 · 2022年3月20日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

【KDD2020-清华大学】自适应图编码器，Adaptive Graph Encoder for Attributed Graph Embedding

专知会员服务

99+阅读 · 2020年7月6日