The spread of fake news has long been a social issue and the necessity of identifying it has become evident since its dangers are well recognized. In addition to causing uneasiness among the public, it has even more devastating consequences. For instance, it might lead to death during pandemics due to unverified medical instructions. This study aims to build a model for identifying fake news using graphs and machine learning algorithms. Instead of scanning the news content or user information, the research explicitly focuses on the spreading network, which shows the interconnection among people, and graph features such as the Eigenvector centrality, Jaccard Coefficient, and the shortest path. Fourteen features are extracted from graphs and tested in thirteen machine learning models. After analyzing these features and comparing the test result of machine learning models, the results reflect that propensity and centrality contribute highly to the classification. The best performing models reach 0.9913 and 0.9987 separately from datasets Twitter15 and Twitter16 using a modified tree classifier and Support Vector Classifier. This model can effectively predict fake news, prevent potential negative social impact caused by fake news, and provide a new perspective on graph feature selection for machine learning models.
翻译:长期以来,假新闻的传播一直是一个社会问题,其传播的必要性已经很明显,因为其危险已经得到了人们的公认。它除了在公众中造成不安之外,还具有更严重的破坏性后果。例如,它可能由于未经核实的医疗指示而在流行病期间导致死亡。这项研究的目的是建立一个模型,用图表和机器学习算法来识别假新闻。研究不是扫描新闻内容或用户信息,而是明确侧重于传播网络,显示人们之间的相互联系,以及图表特征,如Eigenvictor Central、Jaccar Covality和最短路径。从图表中提取14个特征,并在13个机器学习模型中测试。在分析这些特征并比较机器学习模型的测试结果之后,结果表明,敏度和中心度对分类有很大帮助。最佳的模型与数据集Twitter15和Twitter16分别达0.9913和0.9987。使用修改的树分级器和支助Vecterctor分类法可以有效地预测假新闻,防止假新闻可能造成的负面社会影响,并为机器学习模型的图表选择提供新的视角。