The widespread of offensive content online has become a reason for great concern in recent years, motivating researchers to develop robust systems capable of identifying such content automatically. With the goal of carrying out a fair evaluation of these systems, several international competitions have been organized, providing the community with important benchmark data and evaluation methods for various languages. Organized since 2019, the HASOC (Hate Speech and Offensive Content Identification) shared task is one of these initiatives. In its fourth iteration, HASOC 2022 included three subtracks for English, Hindi, and Marathi. In this paper, we report the results of the HASOC 2022 Marathi subtrack which provided participants with a dataset containing data from Twitter manually annotated using the popular OLID taxonomy. The Marathi track featured three additional subtracks, each corresponding to one level of the taxonomy: Task A - offensive content identification (offensive vs. non-offensive); Task B - categorization of offensive types (targeted vs. untargeted), and Task C - offensive target identification (individual vs. group vs. others). Overall, 59 runs were submitted by 10 teams. The best systems obtained an F1 of 0.9745 for Subtrack 3A, an F1 of 0.9207 for Subtrack 3B, and F1 of 0.9607 for Subtrack 3C. The best performing algorithms were a mixture of traditional and deep learning approaches.
翻译:自2019年以来,网上攻击性内容的广泛性已成为近年来引起极大关注的一个原因,激励研究人员开发能够自动识别此类内容的强大系统。为了对这些系统进行公平评估,组织了几次国际竞赛,为社区提供了各种语言的重要基准数据和评价方法。自2019年以来,HasOC(口头言论和进攻性内容识别)共同的任务就是这些举措之一。在第四次迭代中,HasOC 2022(HasOC 2022)包括了英语、印地语和马拉蒂语的三个子轨道。在本文中,我们报告了HasOC 2022 Marathi 亚轨道的结果,该亚轨道为参与者提供了一个数据集,其中载有利用流行的OLID分类学手动附加说明的Twitter数据。马拉地轨道还增加了三个子轨道,分别相当于一个层次的分类:任务A - 攻击性内容识别(攻击性与非攻击性);任务B - 攻击性类型(目标与非目标)的分类,C任务-攻击性目标识别(个人与群体等)。总体而言,由10个团队提交了载有由Twitter、FMR1 3轨道和F 0.9 最佳系统。