Recently the NLP community has started showing interest towards the challenging task of Hostile Post Detection. This paper present our system for Shared Task at Constraint2021 on "Hostile Post Detection in Hindi". The data for this shared task is provided in Hindi Devanagari script which was collected from Twitter and Facebook. It is a multi-label multi-class classification problem where each data instance is annotated into one or more of the five classes: fake, hate, offensive, defamation, and non-hostile. We propose a two level architecture which is made up of BERT based classifiers and statistical classifiers to solve this problem. Our team 'Albatross', scored 0.9709 Coarse grained hostility F1 score measure on Hostile Post Detection in Hindi subtask and secured 2nd rank out of 45 teams for the task. Our submission is ranked 2nd and 3rd out of a total of 156 submissions with Coarse grained hostility F1 score of 0.9709 and 0.9703 respectively. Our fine grained scores are also very encouraging and can be improved with further finetuning. The code is publicly available.
翻译:最近,NLP社区开始对具有挑战性的敌对事件探测任务表现出兴趣。 本文展示了我们在2021年Cstraintin关于“ 印地语中Hostile Post Setation in In Indhi” 的共享任务系统中的共享任务系统。 共享任务的数据由Twitter 和 Facebook 收集的印地语 Devanagari 脚本中提供。 这是一个多标签的多级分类问题, 每个数据实例都标记成五类中的一类或数类: 假的、 仇恨的、 冒犯的、 诽谤的和非敌对的。 我们提出了一个由基于BERT的分类员和统计分类员组成的两级架构来解决这个问题。 我们的团队“ Albatross ” 在印地语子中的敌对事件后探测得分为 0. 9709 7 粗粮的F1, 并且为这项任务在45个团队中获得了第2级。 我们的提交材料在总共156份中排第2和第3级, 分别是 0. 0. 9709 和0. 9703 0. 9703 3 。 我们的精细的谷物分也非常令人鼓舞, 并且可以进一步改进。