《人类在网上收集:打击网上仇恨言论的多目标反叙述数据集》 (Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech)

Undermining the impact of hateful content with informed and non-aggressive responses, called counter narratives, has emerged as a possible solution for having healthier online communities. Thus, some NLP studies have started addressing the task of counter narrative generation. Although such studies have made an effort to build hate speech / counter narrative (HS/CN) datasets for neural generation, they fall short in reaching either high-quality and/or high-quantity. In this paper, we propose a novel human-in-the-loop data collection methodology in which a generative language model is refined iteratively by using its own data from the previous loops to generate new training samples that experts review and/or post-edit. Our experiments comprised several loops including dynamic variations. Results show that the methodology is scalable and facilitates diverse, novel, and cost-effective data collection. To our knowledge, the resulting dataset is the only expert-based multi-target HS/CN dataset available to the community.

翻译：以知情和非侵略性回应(称为反叙述)来探究仇恨内容的影响,这已成为实现更健康的在线社区的一个可能解决办法,因此,国家语言方案的一些研究已开始处理反叙述生成的任务,虽然这些研究努力为神经生成建立仇恨言论/反叙述(HS/CN)数据集,但不足以达到高质量和(或)高数量。在本文件中,我们提议采用新的“人与人之间流动数据收集方法”,利用以前循环中的数据来迭接地完善一种基因化语言模型,以产生新的培训样本,供专家审查和(或)编辑后使用。我们的实验包括若干循环,包括动态变化。结果显示,该方法可扩展,便于多样化、新颖和成本效益高的数据收集。据我们所知,由此产生的数据集是唯一可供社区使用的专家多目标HS/CN数据集。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

61+阅读 · 2019年12月21日