This paper gives the overview of the first shared task at FIRE 2020 on fake news detection in the Urdu language. This is a binary classification task in which the goal is to identify fake news using a dataset composed of 900 annotated news articles for training and 400 news articles for testing. The dataset contains news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business. 42 teams from 6 different countries (India, China, Egypt, Germany, Pakistan, and the UK) registered for the task. 9 teams submitted their experimental results. The participants used various machine learning methods ranging from feature-based traditional machine learning to neural network techniques. The best performing system achieved an F-score value of 0.90, showing that the BERT-based approach outperforms other machine learning classifiers.
翻译:本文概述了FIRE 2020年首次分担的关于乌尔都语假新闻探测的任务,这是一项二进制的分类任务,目标是利用900篇附加说明的培训新闻文章和400篇用于测试的新闻文章组成的数据集来识别假新闻,数据集包含五个领域的新闻:(一) 卫生,(二) 体育,(三) Showbiz,(四) 技术,(五) 商业。 来自6个不同国家(印度、中国、埃及、德国、巴基斯坦和联合王国)注册的42个小组。9个小组提交了实验结果。参与者使用了各种机器学习方法,从基于地貌的传统机器学习到神经网络技术。最优秀的系统达到了0.90的F分数值,表明基于BERT的方法优于其他机器学习分类器。