Amid the pandemic COVID-19, the world is facing unprecedented infodemic with the proliferation of both fake and real information. Considering the problematic consequences that the COVID-19 fake-news have brought, the scientific community has put effort to tackle it. To contribute to this fight against the infodemic, we aim to achieve a robust model for the COVID-19 fake-news detection task proposed at CONSTRAINT 2021 (FakeNews-19) by taking two separate approaches: 1) fine-tuning transformers based language models with robust loss functions and 2) removing harmful training instances through influence calculation. We further evaluate the robustness of our models by evaluating on different COVID-19 misinformation test set (Tweets-19) to understand model generalization ability. With the first approach, we achieve 98.13% for weighted F1 score (W-F1) for the shared task, whereas 38.18% W-F1 on the Tweets-19 highest. On the contrary, by performing influence data cleansing, our model with 99% cleansing percentage can achieve 54.33% W-F1 score on Tweets-19 with a trade-off. By evaluating our models on two COVID-19 fake-news test sets, we suggest the importance of model generalization ability in this task to step forward to tackle the COVID-19 fake-news problem in online social media platforms.
翻译:在COVID-19大流行期间,世界正面临着前所未有的假信息和真实信息扩散的流行。考虑到COVID-19假新闻带来的问题性后果,科学界已经努力解决这一问题。为了促进这场对抗模式化的斗争,我们的目标是通过采取两种不同的办法,实现COVID-19假新发现任务的一个强型模式,即在COTRACINT 2021 (FakeNews-19) 提出的COVID-19假新任务中,我们的目标是通过采取两种不同的办法:1) 微调基于语言的变压模型,具有强大的损失功能,2) 通过影响计算消除有害的培训案例。我们进一步评估我们的模型的强健性,通过评估不同的COVID-19错误测试集(Tweets-19)来了解模型的普及能力。我们的第一个方法是,在共同任务中,为加权F1分(W-F1)达到98.13%,而在Tweets-19最高阶段,我们99%的净化模型通过执行影响数据清理,可以在TW-F1-19模型上取得54.33%的W-F1分分,以贸易-19能力来理解模型的重要性。在两套CVI新任务中,在S-格式上,以模拟的前沿平台上,以模拟的S-D先行前进模型将SUVI取取。在双新任务中,在S-D级平台上,在S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-