The rise of emergence of social media platforms has fundamentally altered how people communicate, and among the results of these developments is an increase in online use of abusive content. Therefore, automatically detecting this content is essential for banning inappropriate information, and reducing toxicity and violence on social media platforms. The existing works on hate speech and offensive language detection produce promising results based on pre-trained transformer models, however, they considered only the analysis of abusive content features generated through annotated datasets. This paper addresses a multi-task joint learning approach which combines external emotional features extracted from another corpora in dealing with the imbalanced and scarcity of labeled datasets. Our analysis are using two well-known Transformer-based models, BERT and mBERT, where the later is used to address abusive content detection in multi-lingual scenarios. Our model jointly learns abusive content detection with emotional features by sharing representations through transformers' shared encoder. This approach increases data efficiency, reduce overfitting via shared representations, and ensure fast learning by leveraging auxiliary information. Our findings demonstrate that emotional knowledge helps to more reliably identify hate speech and offensive language across datasets. Our hate speech detection Multi-task model exhibited 3% performance improvement over baseline models, but the performance of multi-task models were not significant for offensive language detection task. More interestingly, in both tasks, multi-task models exhibits less false positive errors compared to single task scenario.
翻译:社交媒体平台的兴起从根本上改变了人们的交流方式,这些发展的结果之一是在线使用滥用内容的增多。因此,自动发现这一内容对于禁止不适当的信息、减少社交媒体平台的毒性和暴力至关重要。关于仇恨言论和冒犯性语言探测的现有工作根据预先培训的变压器模型产生了有希望的结果,然而,他们只考虑对通过附加说明的数据集产生的滥用内容特征的分析。本文述及多任务联合学习方法,该方法结合了另一个公司在处理标签数据集的不平衡和稀缺时所摘取的外部情感特征。我们的分析使用了两个众所周知的变换器模型,即BERT和MBERT, 后者用来应对多语言情景中的滥用内容检测。我们的模式通过通过变压器共用的编码编码,分享对滥用内容特征的分析。这一方法提高了数据效率,通过共享的表达方式减少了过度匹配,并通过利用辅助信息确保快速学习。我们的研究结果表明,情感知识有助于更可靠地识别在数据集中的仇恨言论和攻击性语言。我们仇恨性言论检测多任务模型和多任务模型的对比性性度模型的多重任务。比重的多任务模型都表现。