A drastic rise in potentially life-threatening misinformation has been a by-product of the COVID-19 pandemic. Computational support to identify false information within the massive body of data on the topic is crucial to prevent harm. Researchers proposed many methods for flagging online misinformation related to COVID-19. However, these methods predominantly target specific content types (e.g., news) or platforms (e.g., Twitter). The methods' capabilities to generalize were largely unclear so far. We evaluate fifteen Transformer-based models on five COVID-19 misinformation datasets that include social media posts, news articles, and scientific papers to fill this gap. We show tokenizers and models tailored to COVID-19 data do not provide a significant advantage over general-purpose ones. Our study provides a realistic assessment of models for detecting COVID-19 misinformation. We expect that evaluating a broad spectrum of datasets and models will benefit future research in developing misinformation detection systems.
翻译:潜在威胁生命的错误信息急剧上升是COVID-19大流行的副产品。在大量关于这一专题的数据中,为识别虚假信息提供计算支持对于防止伤害至关重要。研究人员提出了许多方法来标出与COVID-19有关的网上错误信息。然而,这些方法主要针对特定内容类型(例如新闻)或平台(例如Twitter),迄今为止,推广方法的能力基本上还不清楚。我们评估了五套COVID-19错误数据集的十五种基于变异器的模型,其中包括社交媒体文章、新闻文章和科学论文,以填补这一空白。我们展示了与COVID-19数据相适应的代号和模型,并不比一般用途数据具有重大优势。我们的研究对发现COVID-19错误信息的模式进行了现实的评估。我们期望,对广泛的数据集和模型进行评估将有助于今后开发错误信息检测系统的研究。