Recently, graph neural networks (GNNs) have been widely used for document classification. However, most existing methods are based on static word co-occurrence graphs without sentence-level information, which poses three challenges:(1) word ambiguity, (2) word synonymity, and (3) dynamic contextual dependency. To address these challenges, we propose a novel GNN-based sparse structure learning model for inductive document classification. Specifically, a document-level graph is initially generated by a disjoint union of sentence-level word co-occurrence graphs. Our model collects a set of trainable edges connecting disjoint words between sentences and employs structure learning to sparsely select edges with dynamic contextual dependencies. Graphs with sparse structures can jointly exploit local and global contextual information in documents through GNNs. For inductive learning, the refined document graph is further fed into a general readout function for graph-level classification and optimization in an end-to-end manner. Extensive experiments on several real-world datasets demonstrate that the proposed model outperforms most state-of-the-art results, and reveal the necessity to learn sparse structures for each document.
翻译:最近,图表神经网络(GNN)被广泛用于文件分类,然而,大多数现有方法都是基于静态的单词共生图形,没有句级信息,这带来了三个挑战:(1)字模糊,(2)词同义,(3)动态背景依赖性。为了应对这些挑战,我们提议了一个新的基于GNN的零星结构学习模型,用于感化文件分类。具体地说,文件级图表最初是由句级单词共生图形脱节产生的。我们的模型收集了一套可训练的边距,将句子之间脱节的单词连接起来,并采用结构学习,以分散地选择具有动态环境依赖性的边緣。与稀疏结构的图表可以通过GNNM共同利用文件中的本地和全球背景信息。在感化学习中,精细化的文件图表被进一步输入到一个用于图表级分类和最终优化的一般读数个真实世界数据集的大规模实验中,显示拟议的模型比最先进的状态结果要强,并显示有必要学习每个文档的隐密结构。