使用情感意识共享编码器进行仇恨言论和攻击性语言探测 (Hate Speech and Offensive Language Detection using an Emotion-aware Shared Encoder)

The rise of emergence of social media platforms has fundamentally altered how people communicate, and among the results of these developments is an increase in online use of abusive content. Therefore, automatically detecting this content is essential for banning inappropriate information, and reducing toxicity and violence on social media platforms. The existing works on hate speech and offensive language detection produce promising results based on pre-trained transformer models, however, they considered only the analysis of abusive content features generated through annotated datasets. This paper addresses a multi-task joint learning approach which combines external emotional features extracted from another corpora in dealing with the imbalanced and scarcity of labeled datasets. Our analysis are using two well-known Transformer-based models, BERT and mBERT, where the later is used to address abusive content detection in multi-lingual scenarios. Our model jointly learns abusive content detection with emotional features by sharing representations through transformers' shared encoder. This approach increases data efficiency, reduce overfitting via shared representations, and ensure fast learning by leveraging auxiliary information. Our findings demonstrate that emotional knowledge helps to more reliably identify hate speech and offensive language across datasets. Our hate speech detection Multi-task model exhibited 3% performance improvement over baseline models, but the performance of multi-task models were not significant for offensive language detection task. More interestingly, in both tasks, multi-task models exhibits less false positive errors compared to single task scenario.

翻译：社交媒体平台的兴起从根本上改变了人们的交流方式,这些发展的结果之一是在线使用滥用内容的增多。因此,自动发现这一内容对于禁止不适当的信息、减少社交媒体平台的毒性和暴力至关重要。关于仇恨言论和冒犯性语言探测的现有工作根据预先培训的变压器模型产生了有希望的结果,然而,他们只考虑对通过附加说明的数据集产生的滥用内容特征的分析。本文述及多任务联合学习方法,该方法结合了另一个公司在处理标签数据集的不平衡和稀缺时所摘取的外部情感特征。我们的分析使用了两个众所周知的变换器模型,即BERT和MBERT, 后者用来应对多语言情景中的滥用内容检测。我们的模式通过通过变压器共用的编码编码,分享对滥用内容特征的分析。这一方法提高了数据效率,通过共享的表达方式减少了过度匹配,并通过利用辅助信息确保快速学习。我们的研究结果表明,情感知识有助于更可靠地识别在数据集中的仇恨言论和攻击性语言。我们仇恨性言论检测多任务模型和多任务模型的对比性性度模型的多重任务。比重的多任务模型都表现。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/