The wide spread of unfounded election fraud claims surrounding the U.S. 2020 election had resulted in undermining of trust in the election, culminating in violence inside the U.S. capitol. Under these circumstances, it is critical to understand discussions surrounding these claims on Twitter, a major platform where the claims disseminate. To this end, we collected and release the VoterFraud2020 dataset, a multi-modal dataset with 7.6M tweets and 25.6M retweets from 2.6M users related to voter fraud claims. To make this data immediately useful for a wide area of researchers, we further enhance the data with cluster labels computed from the retweet graph, user suspension status, and perceptual hashes of tweeted images. We also include in the dataset aggregated information for all external links and YouTube videos that appear in the tweets. Preliminary analyses of the data show that Twitter's ban actions mostly affected a specific community of voter fraud claim promoters, and exposes the most common URLs, images and YouTube videos shared in the data.
翻译:围绕2020年美国大选的无端选举欺诈指控的广泛蔓延,破坏了人们对选举的信任,最终导致美国国会内部的暴力。在这种情况下,至关重要的是要理解在Twitter上围绕这些指控的讨论,Twitter是这些指控传播的主要平台。为此,我们收集并发布了选民Fraud2020数据集,这是一个多模式数据集,包含7.6M Twitter和25.6Mretweet与选民欺诈指控有关的多模式数据集。为了使这一数据立即对广大研究人员有用,我们进一步加强了从retweet图、用户暂停状态和Twitter图像的感知错觉中计算出来的集群标签数据。我们还将所有外部链接的汇总信息以及推特中出现的YouTube视频包含在数据集中。对数据的初步分析显示,Twitter的禁止行动主要影响到选民欺诈指控者的特定群体,并暴露了数据中共享的最常用的URL、图像和YouTube视频。