Online presence on social media platforms such as Facebook and Twitter has become a daily habit for internet users. Despite the vast amount of services the platforms offer for their users, users suffer from cyber-bullying, which further leads to mental abuse and may escalate to cause physical harm to individuals or targeted groups. In this paper, we present our submission to the Arabic Hate Speech 2022 Shared Task Workshop (OSACT5 2022) using the associated Arabic Twitter dataset. The shared task consists of 3 sub-tasks, sub-task A focuses on detecting whether the tweet is offensive or not. Then, For offensive Tweets, sub-task B focuses on detecting whether the tweet is hate speech or not. Finally, For hate speech Tweets, sub-task C focuses on detecting the fine-grained type of hate speech among six different classes. Transformer models proved their efficiency in classification tasks, but with the problem of over-fitting when fine-tuned on a small or an imbalanced dataset. We overcome this limitation by investigating multiple training paradigms such as Contrastive learning and Multi-task learning along with Classification fine-tuning and an ensemble of our top 5 performers. Our proposed solution achieved 0.841, 0.817, and 0.476 macro F1-average in sub-tasks A, B, and C respectively.
翻译:Facebook和Twitter等社交媒体平台上的在线存在已成为互联网用户的日常习惯。尽管平台为用户提供了大量服务,但用户仍遭受网络欺凌,这进一步导致精神虐待,并可能升级为对个人或目标群体造成身体伤害。在本文中,我们提交到2022年阿拉伯仇恨言论共同任务研讨会(OSACT5 2022),使用相关的阿拉伯推特数据集。共同任务包括3个子任务,次级任务A侧重于发现推特是否冒犯性。然后,对于攻击性Tweets,子任务B侧重于发现推特是否是仇恨言论。最后,对于仇恨言论Tweets,子任务C侧重于在6个不同班级中发现恶性仇恨言论类型。变异模型证明了它们在分类任务中的效率,但在微小或不平衡的数据集上进行微调时存在过度适应的问题。我们通过调查多种培训模式来克服这一限制,如对比性学习和多任务学习,同时进行分类调整,并分别落实了我们最高级的A+0.181、A+0.181、A级的A+0.1817和0.181的宏观解决方案。