New events emerge over time influencing the topics of rumors in social media. Current rumor detection benchmarks use random splits as training, development and test sets which typically results in topical overlaps. Consequently, models trained on random splits may not perform well on rumor classification on previously unseen topics due to the temporal concept drift. In this paper, we provide a re-evaluation of classification models on four popular rumor detection benchmarks considering chronological instead of random splits. Our experimental results show that the use of random splits can significantly overestimate predictive performance across all datasets and models. Therefore, we suggest that rumor detection models should always be evaluated using chronological splits for minimizing topical overlaps.
翻译:新事件随时间而出现,影响社交媒体流言的话题。当前传闻检测基准使用随机分裂作为培训、开发和测试工具,通常导致主题重叠。因此,随机分裂培训模型由于时间概念的漂移,可能无法很好地进行先前未知专题的流言分类。在本文中,我们对四种流行流言检测基准的分类模型进行重新评价,其中考虑到时间顺序而不是随机分裂。我们的实验结果表明,随机分裂的使用可以大大高估所有数据集和模型的预测性能。因此,我们建议,应当使用时间划分来评估流言检测模型,以尽量减少主题重叠。