神经国家语言规划的精细可解释性评价基准 (A Fine-grained Interpretability Evaluation Benchmark for Neural NLP)

While there is increasing concern about the interpretability of neural models, the evaluation of interpretability remains an open problem, due to the lack of proper evaluation datasets and metrics. In this paper, we present a novel benchmark to evaluate the interpretability of both neural models and saliency methods. This benchmark covers three representative NLP tasks: sentiment analysis, textual similarity and reading comprehension, each provided with both English and Chinese annotated data. In order to precisely evaluate the interpretability, we provide token-level rationales that are carefully annotated to be sufficient, compact and comprehensive. We also design a new metric, i.e., the consistency between the rationales before and after perturbations, to uniformly evaluate the interpretability on different types of tasks. Based on this benchmark, we conduct experiments on three typical models with three saliency methods, and unveil their strengths and weakness in terms of interpretability. We will release this benchmark https://www.luge.ai/#/luge/task/taskDetail?taskId=15 and hope it can facilitate the research in building trustworthy systems.

翻译：虽然人们日益关注神经模型的可解释性,但由于缺乏适当的评价数据集和计量标准,对可解释性的评价仍然是一个尚未解决的问题。我们在本文件中提出了一个新的基准,用以评价神经模型和突出方法的可解释性。这一基准包括三项具有代表性的国家实验室任务:情绪分析、文字相似性和阅读理解,每个中心都提供英文和中文附加说明的数据。为了准确评估可解释性,我们提供了象征性的理由,这些理由经过仔细说明,足以充分、紧凑和全面。我们还设计了一个新的指标,即扰动前后的理由的一致性,以统一评估不同类型任务的可解释性。以这一基准为基础,我们用三种突出的方法对三种典型模型进行实验,并展示这些模型在可解释性方面的长处和短处。我们将公布这个基准 https://www.luge.ai/#luge/task/taskDetask?taskil?taskId=15,并希望它能够促进建立可靠系统的研究。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

专知会员服务

32+阅读 · 2019年10月30日