侦查与变形器和TF-IDF的阴谋阴谋 (Detecting COVID-19 Conspiracy Theories with Transformers and TF-IDF)

The sharing of fake news and conspiracy theories on social media has wide-spread negative effects. By designing and applying different machine learning models, researchers have made progress in detecting fake news from text. However, existing research places a heavy emphasis on general, common-sense fake news, while in reality fake news often involves rapidly changing topics and domain-specific vocabulary. In this paper, we present our methods and results for three fake news detection tasks at MediaEval benchmark 2021 that specifically involve COVID-19 related topics. We experiment with a group of text-based models including Support Vector Machines, Random Forest, BERT, and RoBERTa. We find that a pre-trained transformer yields the best validation results, but a randomly initialized transformer with smart design can also be trained to reach accuracies close to that of the pre-trained transformer.

翻译：在社交媒体上分享假新闻和阴谋理论具有广泛的负面影响。通过设计和应用不同的机器学习模型,研究人员在从文本中探测假新闻方面取得了进展。然而,现有的研究非常强调一般的、常识的假新闻,而现实中,假新闻往往涉及迅速变化的主题和特定域名。在本文中,我们介绍了我们在2021年MediaEval基准中的三个虚假新闻探测任务的方法和结果,其中特别涉及COVID-19相关主题。我们试验了一组基于文本的模型,包括支持矢量机、随机森林、BERT和ROBERTA。我们发现,一个预先培训的变压器产生最佳的验证结果,但一个随机初始化的、设计精巧的变压器也可以接受培训,以达到与培训前变压器相似的变压器。

相关内容

TF-IDF

关注 0

TF-IDF（英语：term frequency–inverse document frequency）是一种用于信息检索与文本挖掘的常用加权技术。tf-idf是一种统计方法，用以评估一字词对于一个文件集或一个语料库中的其中一份文件的重要程度。字词的重要性随着它在文件中出现的次数成正比增加，但同时会随着它在语料库中出现的频率成反比下降。tf-idf加权的各种形式常被搜索引擎应用，作为文件与用户查询之间相关程度的度量或评级。除了tf-idf以外，互联网上的搜索引擎还会使用基于链接分析的评级方法，以确定文件在搜索结果中出现的顺序。

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【O’Reilly讲座】基于深度学习的异常检测方法用于检测大型数据集的质量：Anomaly detection using deep learning to measure the quality of large datasets

专知会员服务

31+阅读 · 2020年1月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日