The rapid increase in fake news, which causes significant damage to society, triggers many fake news related studies, including the development of fake news detection and fact verification techniques. The resources for these studies are mainly available as public datasets taken from Web data. We surveyed 118 datasets related to fake news research on a large scale from three perspectives: (1) fake news detection, (2) fact verification, and (3) other tasks; for example, the analysis of fake news and satire detection. We also describe in detail their utilization tasks and their characteristics. Finally, we highlight the challenges in the fake news dataset construction and some research opportunities that address these challenges. Our survey facilitates fake news research by helping researchers find suitable datasets without reinventing the wheel, and thereby, improves fake news studies in depth.
翻译:虚假新闻的迅速增加给社会造成重大损害,触发了许多虚假新闻相关研究,包括开发假新闻探测和事实核查技术,这些研究的资源主要作为从网络数据中提取的公共数据集提供,我们从三个角度调查了118个与大规模假新闻研究有关的数据集:(1) 假新闻探测,(2) 事实核查,(3) 其他任务;例如,分析假新闻和讽刺色探测。我们还详细描述其利用任务和特点。最后,我们强调假新闻数据集建设的挑战以及应对这些挑战的一些研究机会。我们的调查有助于假新闻研究,帮助研究人员找到合适的数据集,而不必重新造车,从而在深度上改进假新闻研究。