Deep learning on graphs has attracted significant interests recently. However, most of the works have focused on (semi-) supervised learning, resulting in shortcomings including heavy label reliance, poor generalization, and weak robustness. To address these issues, self-supervised learning (SSL), which extracts informative knowledge through well-designed pretext tasks without relying on manual labels, has become a promising and trending learning paradigm for graph data. Different from SSL on other domains like computer vision and natural language processing, SSL on graphs has an exclusive background, design ideas, and taxonomies. Under the umbrella of graph self-supervised learning, we present a timely and comprehensive review of the existing approaches which employ SSL techniques for graph data. We construct a unified framework that mathematically formalizes the paradigm of graph SSL. According to the objectives of pretext tasks, we divide these approaches into four categories: generation-based, auxiliary property-based, contrast-based, and hybrid approaches. We further conclude the applications of graph SSL across various research fields and summarize the commonly used datasets, evaluation benchmark, performance comparison and open-source codes of graph SSL. Finally, we discuss the remaining challenges and potential future directions in this research field.
翻译:最近,对图表的深入学习吸引了很大的兴趣。然而,大多数作品都集中在(半)监督的学习上,导致缺陷,包括严重依赖标签、笼统化和薄弱的强力。为了解决这些问题,自我监督的学习(SSL)通过设计良好的借口任务而获得信息,而不必依靠人工标签,已经成为图表数据的一个有希望和有趋势的学习模式。在计算机视觉和自然语言处理等其他领域,与SSL不同,图表上的SSL有一个独家的背景、设计想法和分类。在图形自我监督学习的总括之下,我们及时和全面地审查了使用SSLL图表数据技术的现有方法。我们建立了一个统一的框架,从数学上将图形SSL的范式正规化。根据借口任务的目标,我们将这些方法分为四类:基于生成的、基于辅助产权的、基于对比的和混合方法。我们进一步完成了图表在各种研究领域的应用,并总结了常用的数据集、评价基准、业绩比较和开放源代码。最后,我们讨论了该图表的实地研究方向和未来方向。