Over the past decade, we have witnessed the rise of misinformation on the Internet, with online users constantly falling victims of fake news. A multitude of past studies have analyzed fake news diffusion mechanics and detection and mitigation techniques. However, there are still open questions about their operational behavior such as: How old are fake news websites? Do they typically stay online for long periods of time? Do such websites synchronize with each other their up and down time? Do they share similar content through time? Which third-parties support their operations? How much user traffic do they attract, in comparison to mainstream or real news websites? In this paper, we perform a first of its kind investigation to answer such questions regarding the online presence of fake news websites and characterize their behavior in comparison to real news websites. Based on our findings, we build a content-agnostic ML classifier for automatic detection of fake news websites (i.e. accuracy) that are not yet included in manually curated blacklists.
翻译:过去十年来,我们看到互联网上错误信息不断上升,网上用户不断成为假新闻的受害者。过去的许多研究都分析了假新闻传播机制以及检测和缓解技术。然而,关于他们的操作行为仍有一些开放的问题,比如:假新闻网站有多老旧?它们通常在网上长期停留吗?这些网站是否互相同步、同步?它们是否通过时间共享类似内容?第三方支持它们的操作?它们吸引了多少用户流量,与主流网站或真实新闻网站相比?在本文中,我们进行了首次调查,以回答关于假新闻网站在网上存在的问题,并描述它们与真实新闻网站相比的行为。根据我们的调查结果,我们建立了一个内容不可知性 ML分类器,用于自动检测尚未纳入手工整理黑名单的假新闻网站(即准确性 ) 。