Withtheadventofsocialmedia,therehasbeenanextremely rapid increase in the content shared online. Consequently, the propagation of fake news and hostile messages on social media platforms has also skyrocketed. In this paper, we address the problem of detecting hostile and fake content in the Devanagari (Hindi) script as a multi-class, multi-label problem. Using NLP techniques, we build a model that makes use of an abusive language detector coupled with features extracted via Hindi BERT and Hindi FastText models and metadata. Our model achieves a 0.97 F1 score on coarse grain evaluation on Hostility detection task. Additionally, we built models to identify fake news related to Covid-19 in English tweets. We leverage entity information extracted from the tweets along with textual representations learned from word embeddings and achieve a 0.93 F1 score on the English fake news detection task.
翻译:有了社会媒体的头版,网上共享的内容迅速增长。 因此,在社交媒体平台上传播假新闻和敌对信息也大增。 在本文中,我们把发现Devanagari (Hindi) 脚本中的敌对和虚假内容的问题作为一个多级、多标签问题来解决。我们使用NLP技术,建立了一个模型,利用滥用语言探测器,加上印地语BERT和印地语快递模型和元数据所提取的特征。我们的模型在敌对状态检测任务粗粮评估上取得了0.97 F1分。此外,我们建立了模型,以识别英文推特中与Covid-19有关的假消息。我们利用从推特中提取的实体信息以及从嵌入的文字中学习的文字表达,并在英文假新闻检测任务上取得了0.93 F1分。