Text classification is a primary task in natural language processing (NLP). Recently, graph neural networks (GNNs) have developed rapidly and been applied to text classification tasks. As a special kind of graph data, the tree has a simpler data structure and can provide rich hierarchical information for text classification. Inspired by the structural entropy, we construct the coding tree of the graph by minimizing the structural entropy and propose HINT, which aims to make full use of the hierarchical information contained in the text for the task of text classification. Specifically, we first establish a dependency parsing graph for each text. Then we designed a structural entropy minimization algorithm to decode the key information in the graph and convert each graph to its corresponding coding tree. Based on the hierarchical structure of the coding tree, the representation of the entire graph is obtained by updating the representation of non-leaf nodes in the coding tree layer by layer. Finally, we present the effectiveness of hierarchical information in text classification. Experimental results show that HINT outperforms the state-of-the-art methods on popular benchmarks while having a simple structure and few parameters.
翻译:文本分类是自然语言处理( NLP) 的主要任务。 最近, 图形神经网络( GNNS) 快速发展, 并应用于文本分类任务 。 作为特殊类型的图表数据, 树上的数据结构更简单, 可以提供丰富的文字分类等级信息 。 在结构 entropy 的激励下, 我们通过将结构的 entropy 最小化来构建图形的编码树, 并提议 HINT, 目的是充分利用文本分类任务中包含的等级信息 。 具体地说, 我们首先为每个文本建立一个依赖性解析图。 然后, 我们设计了一个结构最小化的结构性编码算法, 将每个图表转换为相应的编码树 。 根据编码树的等级结构, 整个图形的表示方式是通过按层更新对编码树层中非叶节的表示方式。 最后, 我们展示了文字分类中等级信息的有效性 。 实验结果显示, HINT 在有简单的结构和少数参数的情况下, 超越了流行基准的状态方法 。