Social media posts may go viral and reach large numbers of people within a short period of time. Such posts may threaten the public dialogue if they contain misleading content, making their early detection highly crucial. Previous works proposed their own metrics to annotate if a tweet is viral or not in order to automatically detect them later. However, such metrics may not accurately represent viral tweets or may introduce too many false positives. In this work, we use the ground truth data provided by Twitter's "Viral Tweets" topic to review the current metrics and also propose our own metric. We find that a tweet is more likely to be classified as viral by Twitter if the ratio of retweets to its author's followers exceeds some threshold. We found this threshold to be 2.16 in our experiments. This rule results in less false positives although it favors smaller accounts. We also propose a transformers-based model to early detect viral tweets which reports an F1 score of 0.79. The code and the tweet ids are publicly available at: https://github.com/tugrulz/ViralTweets
翻译:社会媒体文章可能会在短短的时间内传播病毒, 影响大量民众。 这些文章可能会威胁公众对话, 如果它们含有误导内容, 使得早期发现变得至关重要。 以前的工作提出他们自己的衡量标准, 说明如果推特是病毒性的或者不是病毒性的, 以便稍后自动发现它们。 但是, 这些衡量标准可能不能准确地代表病毒性的推特, 或者引入太多虚假的正面信息。 在这项工作中, 我们使用Twitter的“ Viral Tweets” 主题提供的地面真象数据来审查当前的衡量标准, 并提出我们自己的衡量标准。 我们发现, 如果retweets与作者追随者的比例超过某些门槛, 则Twitter更有可能被归类为病毒性。 我们发现, 在我们的实验中, 这个门槛值为2. 16, 虽然它偏爱较小的账户, 但结果并不那么错误的正数。 我们还建议基于变压器的模型来早期检测那些报告F1分为0.79的病毒性推文。 代码和推文ids在以下公布: https://gighthub.com/tugralTwelTweets:</s>