Heterogeneous networks, which connect informative nodes containing text with different edge types, are routinely used to store and process information in various real-world applications. Graph Neural Networks (GNNs) and their hyperbolic variants provide a promising approach to encode such networks in a low-dimensional latent space through neighborhood aggregation and hierarchical feature extraction, respectively. However, these approaches typically ignore metapath structures and the available semantic information. Furthermore, these approaches are sensitive to the noise present in the training data. To tackle these limitations, in this paper, we propose Text Enriched Sparse Hyperbolic Graph Convolution Network (TESH-GCN) to capture the graph's metapath structures using semantic signals and further improve prediction in large heterogeneous graphs. In TESH-GCN, we extract semantic node information, which successively acts as a connection signal to extract relevant nodes' local neighborhood and graph-level metapath features from the sparse adjacency tensor in a reformulated hyperbolic graph convolution layer. These extracted features in conjunction with semantic features from the language model (for robustness) are used for the final downstream task. Experiments on various heterogeneous graph datasets show that our model outperforms the current state-of-the-art approaches by a large margin on the task of link prediction. We also report a reduction in both the training time and model parameters compared to the existing hyperbolic approaches through a reformulated hyperbolic graph convolution. Furthermore, we illustrate the robustness of our model by experimenting with different levels of simulated noise in both the graph structure and text, and also, present a mechanism to explain TESH-GCN's prediction by analyzing the extracted metapaths.
翻译:将含有不同边缘类型文本的信息丰富的节点连接在一起的遗传网络,通常用于储存和处理各种真实世界应用中的信息。 图形神经网络(GNNS)及其双曲变体提供了一种有希望的方法,通过周边聚合和分级特征提取分别将这种网络编码在低维潜在空间中。 但是,这些方法通常忽略了相关偏移结构和现有语义信息。 此外, 这些方法对培训数据中的噪音十分敏感。 为了应对这些局限性,我们在本文件中提议, 文本精选的超正曲线图表变异网络(TESH-GCN)使用语义信号来记录图形的元性结构, 并进一步改进大混异图形的预测。 在TESH-GCN中, 我们提取的语义节点信息, 相继作为连接信号, 提取相关节点的本地相近点和图形级的相近点信息。 通过再版的双向双向双向的双向的双向曲线变异变异图, 我们从语言模型(为强性)的变异的解结构结构中提取了图形结构结构结构, 也通过当前变动了我们当前变变变的变的变的模型中的数据。