The sudden widespread menace created by the present global pandemic COVID-19 has had an unprecedented effect on our lives. Man-kind is going through humongous fear and dependence on social media like never before. Fear inevitably leads to panic, speculations, and the spread of misinformation. Many governments have taken measures to curb the spread of such misinformation for public well being. Besides global measures, to have effective outreach, systems for demographically local languages have an important role to play in this effort. Towards this, we propose an approach to detect fake news about COVID-19 early on from social media, such as tweets, for multiple Indic-Languages besides English. In addition, we also create an annotated dataset of Hindi and Bengali tweet for fake news detection. We propose a BERT based model augmented with additional relevant features extracted from Twitter to identify fake tweets. To expand our approach to multiple Indic languages, we resort to mBERT based model which is fine-tuned over created dataset in Hindi and Bengali. We also propose a zero-shot learning approach to alleviate the data scarcity issue for such low resource languages. Through rigorous experiments, we show that our approach reaches around 89% F-Score in fake tweet detection which supercedes the state-of-the-art (SOTA) results. Moreover, we establish the first benchmark for two Indic-Languages, Hindi and Bengali. Using our annotated data, our model achieves about 79% F-Score in Hindi and 81% F-Score for Bengali Tweets. Our zero-shot model achieves about 81% F-Score in Hindi and 78% F-Score for Bengali Tweets without any annotated data, which clearly indicates the efficacy of our approach.
翻译:目前全球流行COVID-19造成的突然蔓延的威胁对我们的生活产生了前所未有的影响。人类正在经历前所未有的巨大恐惧和对社交媒体的依赖。恐惧不可避免地导致恐慌、猜测和错误信息的扩散。许多国家政府已采取措施遏制这种错误信息的扩散,以促进公众福祉。除了全球措施外,为了有效推广,人口学地方语言系统在这项努力中可以发挥重要作用。为此,我们提议了一种办法,从社交媒体早期发现关于COVID-19的假消息,例如推特,用于多种印地语和英语以外的语言。此外,我们还创建了一个印地语和孟加拉语的附加说明的数据集,用于虚假新闻探测。我们提议了一个基于BERT的模型,从推特中提取了更多的相关功能,以识别虚假的推文。为了扩大我们对多种印地语语言的处理办法,我们采用基于mBERT的模型,该模型对印地语和孟加拉语中创建的数据设置了精确的模型。我们还提议了一个零发式学习方法,以缓解这种低语言的数据短缺问题。此外,我们通过严格的实验,我们用F-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-IL-I-I-S-S-S-S-S-S-S-I-I-I-I-S-S-S-S-I-I-I-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-BAR-I-I-BAR-BAR-BAR-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-IL-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I