Social media posts may go viral and reach large numbers of people within a short period of time. Such posts may threaten the public dialogue if they contain misleading content such as fake news. As such, early detection of viral posts may be crucial for tasks such as fact-checking. Previous works proposed their own metrics to measure virality. However, such metrics may not accurately represent viral tweets or may introduce too many false positives. In this work, we use the ground truth data provided by Twitter's "Viral Tweets" topic. We propose a dataset of tweets that are labeled by Twitter as viral and a dataset of all tweets from users who authored a viral tweet. We review the proposed metrics to represent the viral tweets and propose our own metric. We also propose a transformers-based model to predict viral tweets. The code and the tweet ids are publicly available at: https://github.com/tugrulz/ViralTweets
翻译:社会媒体文章可能会在短短的时间内传播病毒, 影响大量民众。 这些文章可能会威胁公众对话, 如果含有假新闻等误导内容的话。 因此, 及早发现病毒邮件可能对于诸如事实检查等任务至关重要 。 先前的著作提出了自己的量度来测量病毒性。 但是, 这些量度可能无法准确地代表病毒性推文, 也可能引入太多虚假的正面信息 。 在这项工作中, 我们使用Twitter“ Viral Tweets” 主题提供的地面真象数据。 我们提议建立一套由Twitter标为病毒性的推文数据集, 以及来自提供病毒性推文的用户的所有推文的数据集。 我们审查拟议的指标, 以显示病毒性推文, 并提出我们自己的量度。 我们还提议了一个基于变压器的模型来预测病毒性推文。 代码和推文ids可以公开查阅: https://github.com/tugrulz/ViralTwets。</s>