“为什么这种误导? ” : 检测新闻头条的幻觉和解释。 ("Why is this misleading?": Detecting News Headline Hallucinations with Explanations)

Automatic headline generation enables users to comprehend ongoing news events promptly and has recently become an important task in web mining and natural language processing. With the growing need for news headline generation, we argue that the hallucination issue, namely the generated headlines being not supported by the original news stories, is a critical challenge for the deployment of this feature in web-scale systems Meanwhile, due to the infrequency of hallucination cases and the requirement of careful reading for raters to reach the correct consensus, it is difficult to acquire a large dataset for training a model to detect such hallucinations through human curation. In this work, we present a new framework named ExHalder to address this challenge for headline hallucination detection. ExHalder adapts the knowledge from public natural language inference datasets into the news domain and learns to generate natural language sentences to explain the hallucination detection results. To evaluate the model performance, we carefully collect a dataset with more than six thousand labeled <article, headline> pairs. Extensive experiments on this dataset and another six public ones demonstrate that ExHalder can identify hallucinated headlines accurately and justifies its predictions with human-readable natural language explanations.

翻译：自动头版生成使用户能够迅速理解正在发生的新闻事件,并且最近已成为网络采矿和自然语言处理中的一项重要任务。随着对新闻头版生成的需求日益增加,我们争辩说,幻觉问题,即生成的头条标题没有得到原始新闻报道的支持,是将这一特写用于网络规模系统的一项关键挑战。同时,由于幻觉案例的频率不高,而且要求评分员仔细阅读才能达成正确的共识,因此很难获得大型数据集,用于培训通过人类曲线检测此类幻觉的模式。在这项工作中,我们提出了一个名为ExHalder的新框架,以应对头条幻觉检测的挑战。ExHalder将公共自然语言推论数据集的知识应用于新闻领域,并学会生成自然语言句子来解释幻觉检测结果。为了评估模型性能,我们仔细收集了一套数据集,有6 000多个标签的 < article, heline > 配对。关于该数据集的广泛实验和另外6个公众实验显示,ExHalder能够准确识别头条头条标题,并用人类可读的自然语言解释其预测。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日