The growing societal dependence on social media and user generated content for news and information has increased the influence of unreliable sources and fake content, which muddles public discourse and lessens trust in the media. Validating the credibility of such information is a difficult task that is susceptible to confirmation bias, leading to the development of algorithmic techniques to distinguish between fake and real news. However, most existing methods are challenging to interpret, making it difficult to establish trust in predictions, and make assumptions that are unrealistic in many real-world scenarios, e.g., the availability of audiovisual features or provenance. In this work, we focus on fake news detection of textual content using interpretable features and methods. In particular, we have developed a deep probabilistic model that integrates a dense representation of textual news using a variational autoencoder and bi-directional Long Short-Term Memory (LSTM) networks with semantic topic-related features inferred from a Bayesian admixture model. Extensive experimental studies with 3 real-world datasets demonstrate that our model achieves comparable performance to state-of-the-art competing models while facilitating model interpretability from the learned topics. Finally, we have conducted model ablation studies to justify the effectiveness and accuracy of integrating neural embeddings and topic features both quantitatively by evaluating performance and qualitatively through separability in lower dimensional embeddings.
翻译:社会日益依赖社交媒体和用户生成的新闻和信息内容,增加了不可靠来源和虚假内容的影响,混淆了公共言论,削弱了对媒体的信任。验证这些信息的可信度是一项困难的任务,容易证实偏见,导致发展算法技术,以区分假新闻和真实新闻。然而,大多数现有方法都具有挑战性,难以解释,难以建立对预测的信任,也难以作出在许多现实世界情景中不切实际的假设,例如视听特征或出处的可得性。在这项工作中,我们侧重于利用可解释的特征和方法对文本内容进行假新闻检测。特别是,我们开发了一种深度的概率模型,利用变式自动编码和双向短期记忆(LSTM)网络以及从Bayesian adixturt 模型推断的语义主题相关特征,从而难以建立信任。与3个真实世界数据集进行的广泛实验研究表明,我们模型的性能与更低层次的模型相似,在可解释性模型和定性模型的精确性方面,我们开发了一个深度的模型,同时通过从所学的专题性研究,将模型的准确性与定性加以整合。最后,从所学专题中,将模型的精确性加以解释。