Increasing amounts of freely available data both in textual and relational form offers exploration of richer document representations, potentially improving the model performance and robustness. An emerging problem in the modern era is fake news detection -- many easily available pieces of information are not necessarily factually correct, and can lead to wrong conclusions or are used for manipulation. In this work we explore how different document representations, ranging from simple symbolic bag-of-words, to contextual, neural language model-based ones can be used for efficient fake news identification. One of the key contributions is a set of novel document representation learning methods based solely on knowledge graphs, i.e. extensive collections of (grounded) subject-predicate-object triplets. We demonstrate that knowledge graph-based representations already achieve competitive performance to conventionally accepted representation learners. Furthermore, when combined with existing, contextual representations, knowledge graph-based document representations can achieve state-of-the-art performance. To our knowledge this is the first larger-scale evaluation of how knowledge graph-based representations can be systematically incorporated into the process of fake news classification.
翻译:在这项工作中,我们探索了从简单的象征性词袋到背景、神经语言模型等各种文件表述方法如何用于高效率的假新闻识别,其中一项关键贡献是一套新的文件表述学习方法,仅以知识图表为基础,即广泛收集(底部)专题预测对象对象的三重数据。我们证明,基于图表的知识表述方法已经为传统接受的代表学习者实现了竞争性业绩。此外,如果与现有的背景表述相结合,基于图表的知识文件表述方法可以实现最新业绩。我们知道,这是对知识图形表达方法如何系统地纳入假新闻分类过程的第一次大规模评估。