利用多任务学习 " 政治公共数字案例研究 ",在看不见的域域探测仇恨言论 (Detect Hate Speech in Unseen Domains using Multi-Task Learning: A Case Study of Political Public Figures)

Automatic identification of hateful and abusive content is vital in combating the spread of harmful online content and its damaging effects. Most existing works evaluate models by examining the generalization error on train-test splits on hate speech datasets. These datasets often differ in their definitions and labeling criteria, leading to poor model performance when predicting across new domains and datasets. In this work, we propose a new Multi-task Learning (MTL) pipeline that utilizes MTL to train simultaneously across multiple hate speech datasets to construct a more encompassing classification model. We simulate evaluation on new previously unseen datasets by adopting a leave-one-out scheme in which we omit a target dataset from training and jointly train on the other datasets. Our results consistently outperform a large sample of existing work. We show strong results when examining generalization error in train-test splits and substantial improvements when predicting on previously unseen datasets. Furthermore, we assemble a novel dataset, dubbed PubFigs, focusing on the problematic speech of American Public Political Figures. We automatically detect problematic speech in the $305,235$ tweets in PubFigs, and we uncover insights into the posting behaviors of public figures.

翻译：自动识别仇恨和滥用内容对于遏制有害在线内容内容的传播及其破坏性效应至关重要。大多数现有工作都通过在仇恨言论数据集的火车测试分解中检查一般化错误来评价模型。这些数据集的定义和标签标准往往不同,导致在预测新领域和数据集时的模型性能差。在这项工作中,我们提议一个新的多任务学习(MTL)管道,利用MTL在多个仇恨言论数据集之间同时培训,以构建一个涵盖性更强的分类模型。我们通过采用休假一退出计划,从培训和其他数据集的联合培训中省略一个目标数据集,模拟对以前未见的新数据集的评估。我们的结果始终优于现有工作的大量样本。我们在审查火车测试分解的概括性错误时显示出强烈的结果,在预测先前隐蔽的数据集时也显示出显著的改进。此外,我们以美国公共政治人物的有问题的言论为焦点,我们自动地在305,235美元推特上的错误的演讲中发现公众洞察到PubFig中的数字。

相关内容

多任务学习

关注 161

多任务学习（MTL）是机器学习的一个子领域，可以同时解决多个学习任务，同时利用各个任务之间的共性和差异。与单独训练模型相比，这可以提高特定任务模型的学习效率和预测准确性。多任务学习是归纳传递的一种方法，它通过将相关任务的训练信号中包含的域信息用作归纳偏差来提高泛化能力。通过使用共享表示形式并行学习任务来实现,每个任务所学的知识可以帮助更好地学习其它任务。

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日