用于可信赖性评价的感情分析数据集 (A Sentiment Analysis Dataset for Trustworthiness Evaluation)

While deep learning models have greatly improved the performance of most artificial intelligence tasks, they are often criticized to be untrustworthy due to the black-box problem. Consequently, many works have been proposed to study the trustworthiness of deep learning. However, as most open datasets are designed for evaluating the accuracy of model outputs, there is still a lack of appropriate datasets for evaluating the inner workings of neural networks. The lack of datasets obviously hinders the development of trustworthiness research. Therefore, in order to systematically evaluate the factors for building trustworthy systems, we propose a novel and well-annotated sentiment analysis dataset to evaluate robustness and interpretability. To evaluate these factors, our dataset contains diverse annotations about the challenging distribution of instances, manual adversarial instances and sentiment explanations. Several evaluation metrics are further proposed for interpretability and robustness. Based on the dataset and metrics, we conduct comprehensive comparisons for the trustworthiness of three typical models, and also study the relations between accuracy, robustness and interpretability. We release this trustworthiness evaluation dataset at \url{https://github/xyz} and hope our work can facilitate the progress on building more trustworthy systems for real-world applications.

翻译：虽然深层次的学习模型大大改善了大多数人工智能任务的业绩,但常常由于黑盒问题而批评它们不可信,因此,许多工作被建议研究深层学习的可信度,然而,由于大多数开放的数据集是用来评价模型产出的准确性,因此仍然缺乏适当的数据集来评价神经网络的内部运行情况,缺乏数据集显然妨碍进行可靠研究。因此,为了系统地评价建立可靠系统的因素,我们建议建立一个新颖和有说明的情绪分析数据集,以评价可靠性和可解释性。为了评估这些因素,我们的数据集包含关于具有挑战性的事件分布、人工对抗实例和情绪解释的多种说明。还进一步提出若干评价指标,以说明可解释性和稳健性。根据数据集和指标,我们对三种典型模型的可靠程度进行全面比较,并研究准确性、稳健性和可解释性之间的关系。我们在\url{https://github/xyz}公布这一可靠的数据集,以评价可靠性评价数据集,以评价是否可靠。我们希望我们的工作能够促进建立更可靠的现实应用系统的进展。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

专知会员服务

7+阅读 · 2019年12月19日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日