While deep learning models have greatly improved the performance of most artificial intelligence tasks, they are often criticized to be untrustworthy due to the black-box problem. Consequently, many works have been proposed to study the trustworthiness of deep learning. However, as most open datasets are designed for evaluating the accuracy of model outputs, there is still a lack of appropriate datasets for evaluating the inner workings of neural networks. The lack of datasets obviously hinders the development of trustworthiness research. Therefore, in order to systematically evaluate the factors for building trustworthy systems, we propose a novel and well-annotated sentiment analysis dataset to evaluate robustness and interpretability. To evaluate these factors, our dataset contains diverse annotations about the challenging distribution of instances, manual adversarial instances and sentiment explanations. Several evaluation metrics are further proposed for interpretability and robustness. Based on the dataset and metrics, we conduct comprehensive comparisons for the trustworthiness of three typical models, and also study the relations between accuracy, robustness and interpretability. We release this trustworthiness evaluation dataset at \url{https://github/xyz} and hope our work can facilitate the progress on building more trustworthy systems for real-world applications.
翻译:虽然深层次的学习模型大大改善了大多数人工智能任务的业绩,但常常由于黑盒问题而批评它们不可信,因此,许多工作被建议研究深层学习的可信度,然而,由于大多数开放的数据集是用来评价模型产出的准确性,因此仍然缺乏适当的数据集来评价神经网络的内部运行情况,缺乏数据集显然妨碍进行可靠研究。因此,为了系统地评价建立可靠系统的因素,我们建议建立一个新颖和有说明的情绪分析数据集,以评价可靠性和可解释性。为了评估这些因素,我们的数据集包含关于具有挑战性的事件分布、人工对抗实例和情绪解释的多种说明。还进一步提出若干评价指标,以说明可解释性和稳健性。根据数据集和指标,我们对三种典型模型的可靠程度进行全面比较,并研究准确性、稳健性和可解释性之间的关系。我们在\url{https://github/xyz}公布这一可靠的数据集,以评价可靠性评价数据集,以评价是否可靠。我们希望我们的工作能够促进建立更可靠的现实应用系统的进展。