With the pandemic of COVID-19, relevant fake news is spreading all over the sky throughout the social media. Believing in them without discrimination can cause great trouble to people's life. However, universal language models may perform weakly in these fake news detection for lack of large-scale annotated data and sufficient semantic understanding of domain-specific knowledge. While the model trained on corresponding corpora is also mediocre for insufficient learning. In this paper, we propose a novel transformer-based language model fine-tuning approach for these fake news detection. First, the token vocabulary of individual model is expanded for the actual semantics of professional phrases. Second, we adapt the heated-up softmax loss to distinguish the hard-mining samples, which are common for fake news because of the disambiguation of short text. Then, we involve adversarial training to improve the model's robustness. Last, the predicted features extracted by universal language model RoBERTa and domain-specific model CT-BERT are fused by one multiple layer perception to integrate fine-grained and high-level specific representations. Quantitative experimental results evaluated on existing COVID-19 fake news dataset show its superior performances compared to the state-of-the-art methods among various evaluation metrics. Furthermore, the best weighted average F1 score achieves 99.02%.
翻译:随着COVID-19的流行,相关的假新闻正在社交媒体中到处传播。相信它们而不加歧视,可能会给人们的生活造成巨大麻烦。然而,由于缺少大规模附加数据和对特定领域知识的语义理解不足,通用语言模型在这些假新闻探测中的表现可能微弱。虽然关于相应的公司模式的训练也是平庸的,学习不够充分。在本文中,我们建议为这些假新闻探测采用一种新型的基于变压器的语言模型微调模型。首先,个人模型的象征性词汇为专业词句的实际语义表达方式而扩大。第二,我们调整高压软体损失,以区分硬体采矿样本,由于短文的模糊,这些样本对假新闻很常见。然后,我们进行对抗性培训,以提高模型的稳健性。最后,通过通用语言模型RoBERTA和特定域模型CT-BERT所提取的预测特征,由多层概念概念结合,将精细和高层次的具体表述方式结合起来。第二,我们调整了加热调的软体软体损失的样本,以辨别为假体标本新闻样本,因为短体的模型显示现有COVI1 标准标准评了各种标准评。