Can we combine heterogenous graph structure with text to learn high-quality semantic and behavioural representations? Graph neural networks (GNN)s encode numerical node attributes and graph structure to achieve impressive performance in a variety of supervised learning tasks. Current GNN approaches are challenged by textual features, which typically need to be encoded to a numerical vector before provided to the GNN that may incur some information loss. In this paper, we put forth an efficient and effective framework termed language model GNN (LM-GNN) to jointly train large-scale language models and graph neural networks. The effectiveness in our framework is achieved by applying stage-wise fine-tuning of the BERT model first with heterogenous graph information and then with a GNN model. Several system and design optimizations are proposed to enable scalable and efficient training. LM-GNN accommodates node and edge classification as well as link prediction tasks. We evaluate the LM-GNN framework in different datasets performance and showcase the effectiveness of the proposed approach. LM-GNN provides competitive results in an Amazon query-purchase-product application.
翻译:我们能否将异质图形结构与文字结合起来,学习高质量的语义和行为表现?图表神经网络(GNN)编码数字节点属性和图形结构,以便在各种监督的学习任务中取得令人印象深刻的业绩?当前的GNN方法受到文本特征的挑战,这些特征通常需要在向GNN提供可能会造成一些信息损失的数值矢量之前进行编码。在本文件中,我们提出了一个高效和有效的框架,称为语言模型GNN(LM-GNN),以联合培训大型语言模型和图形神经网络。我们框架的实效是通过对BERT模型进行分阶段的微调来实现的,先采用异质图形信息,然后采用GNN模式。建议采用几种系统和设计优化,以便能够进行可扩展和高效的培训。LM-GNNN适应节点和边缘分类以及连接的预测任务。我们在不同数据集的性能中评估LM-GNN框架,并展示拟议方法的有效性。LM-GNNN在亚马孙的查询产品应用中提供竞争性的结果。