通过加强多EDU结构意识的文本表达方式检测假新闻 (Detecting fake news by enhanced text representation with multi-EDU-structure awareness)

Since fake news poses a serious threat to society and individuals, numerous studies have been brought by considering text, propagation and user profiles. Due to the data collection problem, these methods based on propagation and user profiles are less applicable in the early stages. A good alternative method is to detect news based on text as soon as they are released, and a lot of text-based methods were proposed, which usually utilized words, sentences or paragraphs as basic units. But, word is a too fine-grained unit to express coherent information well, sentence or paragraph is too coarse to show specific information. Which granularity is better and how to utilize it to enhance text representation for fake news detection are two key problems. In this paper, we introduce Elementary Discourse Unit (EDU) whose granularity is between word and sentence, and propose a multi-EDU-structure awareness model to improve text representation for fake news detection, namely EDU4FD. For the multi-EDU-structure awareness, we build the sequence-based EDU representations and the graph-based EDU representations. The former is gotten by modeling the coherence between consecutive EDUs with TextCNN that reflect the semantic coherence. For the latter, we first extract rhetorical relations to build the EDU dependency graph, which can show the global narrative logic and help deliver the main idea truthfully. Then a Relation Graph Attention Network (RGAT) is set to get the graph-based EDU representation. Finally, the two EDU representations are incorporated as the enhanced text representation for fake news detection, using a gated recursive unit combined with a global attention mechanism. Experiments on four cross-source fake news datasets show that our model outperforms the state-of-the-art text-based methods.

翻译：由于假新闻对社会和个人构成了严重的威胁,因此通过考虑文本、传播和用户概况,提出了许多研究报告。由于数据收集问题,基于传播和用户概况的这些方法在早期阶段不那么适用。一个较好的替代方法是,在文本发布后立即根据文本检测新闻,并提出了许多基于文本的方法,通常使用文字、句子或段落作为基本单位。但是,文字是一个过于细微的单位,无法很好地表达一致的信息,句子或段落过于粗糙,无法显示具体的信息。哪个颗粒更好,如何利用它来提高假新闻检测的文本代表性,这是两个关键问题。在本文件中,我们引入了初级分流股(EDU),其颗粒在文字和用户概况中,其颗粒值在文字和图表中代表了文字的连贯性。最后,EDU(EDU)结构演示了一种基于序列的模型,以基于图表的表达方式显示一种基于序列的表达方式。前者通过模拟EDU的连续的 EDU(ECNN)在文本检测中进行一致性,而后,将显示一种基于正态的正态显示一种正态的正态结构结构。最后显示一种正态显示一种正态的文本。