Heterogeneous Information Network (HIN) is essential to study complicated networks containing multiple edge types and node types. Meta-path, a sequence of node types and edge types, is the core technique to embed HINs. Since manually curating meta-paths is time-consuming, there is a pressing need to develop automated meta-path generation approaches. Existing meta-path generation approaches cannot fully exploit the rich textual information in HINs, such as node names and edge type names. To address this problem, we propose MetaFill, a text-infilling-based approach for meta-path generation. The key idea of MetaFill is to formulate meta-path identification problem as a word sequence infilling problem, which can be advanced by Pretrained Language Models (PLMs). We observed the superior performance of MetaFill against existing meta-path generation methods and graph embedding methods that do not leverage meta-paths in both link prediction and node classification on two real-world HIN datasets. We further demonstrated how MetaFill can accurately classify edges in the zero-shot setting, where existing approaches cannot generate any meta-paths. MetaFill exploits PLMs to generate meta-paths for graph embedding, opening up new avenues for language model applications in graph analysis.
翻译:包含多个边缘类型和节点类型的复杂网络(HIN)是研究包含多个边缘类型和节点类型的复杂网络的关键。 Meta-path(一个节点类型和边缘类型的序列)是嵌入 HINs的核心技术。由于人工治疗元病是耗时的,迫切需要开发自动化的元病生成方法。现有的元病生成方法无法充分利用HINs中丰富的文本信息,例如节点名称和边缘类型名称。为了解决这一问题,我们提议MetaFill(MetaFill)(一个基于文本填充方法的代元病生成方法)。MetaFill(MetaFill)的关键理念是将元病识别问题设计成一个单词序列来填充问题,这可以通过预设语言模型(PLMs)来推进。我们观察到MetFiell(MetFill)相对于现有的元病生成方法以及图形嵌入方法的优异性表现,这些方法在两个真实世界的HINSet-nde数据集的链接预测和节点分类中都无法利用MetFill(NETFill)分类。我们进一步演示MetFill(MetFill)如何在零点设置中准确对边缘的边缘进行分划分划,在零位设置中,而现有方法无法为MPLIMDRMISMUDMUDMDR 分析。