Automatic fake news detection is a challenging problem in misinformation spreading, and it has tremendous real-world political and social impacts. Past studies have proposed machine learning-based methods for detecting such fake news, focusing on different properties of the published news articles, such as linguistic characteristics of the actual content, which however have limitations due to the apparent language barriers. Departing from such efforts, we propose FNDaaS, the first automatic, content-agnostic fake news detection method, that considers new and unstudied features such as network and structural characteristics per news website. This method can be enforced as-a-Service, either at the ISP-side for easier scalability and maintenance, or user-side for better end-user privacy. We demonstrate the efficacy of our method using data crawled from existing lists of 637 fake and 1183 real news websites, and by building and testing a proof of concept system that materializes our proposal. Our analysis of data collected from these websites shows that the vast majority of fake news domains are very young and appear to have lower time periods of an IP associated with their domain than real news ones. By conducting various experiments with machine learning classifiers, we demonstrate that FNDaaS can achieve an AUC score of up to 0.967 on past sites, and up to 77-92% accuracy on newly-flagged ones.
翻译:在错误信息传播方面,自动假冒新闻探测是一个具有挑战性的问题,它具有巨大的现实世界政治和社会影响。过去的研究已经提出了基于机器的学习方法,以探测这种假新闻,侧重于已发表的新闻文章的不同性质,例如实际内容的语言特点,但因明显的语言障碍而受到限制。我们从这些努力出发,提议FNDaaAS,这是第一个自动、内容不可知的假冒新闻探测方法,它考虑到新的和未经研究的特征,例如每个新闻网站的网络和结构特点。这个方法可以作为一种服务实施,或者在ISP一边,用于更容易传播和维护,或者用户一边,用于改善终端用户隐私。我们展示了我们使用从现有637个假和1183个真实新闻网站列表中采集的数据的方法的有效性,并且通过建立和测试能够落实我们提案的概念系统。我们对从这些网站上收集的数据的分析表明,绝大多数假新闻领域都非常年轻,与其域相关联的IP时间似乎比真实新闻系统要短。我们通过对77-92年的机器升级网站进行各种实验,从而能够达到A-992年的A级和BA的升级。