Automatic identification of hateful and abusive content is vital in combating the spread of harmful online content and its damaging effects. Most existing works evaluate models by examining the generalization error on train-test splits on hate speech datasets. These datasets often differ in their definitions and labeling criteria, leading to poor model performance when predicting across new domains and datasets. In this work, we propose a new Multi-task Learning (MTL) pipeline that utilizes MTL to train simultaneously across multiple hate speech datasets to construct a more encompassing classification model. We simulate evaluation on new previously unseen datasets by adopting a leave-one-out scheme in which we omit a target dataset from training and jointly train on the other datasets. Our results consistently outperform a large sample of existing work. We show strong results when examining generalization error in train-test splits and substantial improvements when predicting on previously unseen datasets. Furthermore, we assemble a novel dataset, dubbed PubFigs, focusing on the problematic speech of American Public Political Figures. We automatically detect problematic speech in the $305,235$ tweets in PubFigs, and we uncover insights into the posting behaviors of public figures.
翻译:自动识别仇恨和滥用内容对于遏制有害在线内容内容的传播及其破坏性效应至关重要。大多数现有工作都通过在仇恨言论数据集的火车测试分解中检查一般化错误来评价模型。这些数据集的定义和标签标准往往不同,导致在预测新领域和数据集时的模型性能差。在这项工作中,我们提议一个新的多任务学习(MTL)管道,利用MTL在多个仇恨言论数据集之间同时培训,以构建一个涵盖性更强的分类模型。我们通过采用休假一退出计划,从培训和其他数据集的联合培训中省略一个目标数据集,模拟对以前未见的新数据集的评估。我们的结果始终优于现有工作的大量样本。我们在审查火车测试分解的概括性错误时显示出强烈的结果,在预测先前隐蔽的数据集时也显示出显著的改进。此外,我们以美国公共政治人物的有问题的言论为焦点,我们自动地在305,235美元推特上的错误的演讲中发现公众洞察到PubFig中的数字。