Deep learning on graphs has recently achieved remarkable success on a variety of tasks while such success relies heavily on the massive and carefully labeled data. However, precise annotations are generally very expensive and time-consuming. To address this problem, self-supervised learning (SSL) is emerging as a new paradigm for extracting informative knowledge through well-designed pretext tasks without relying on manual labels. In this survey, we extend the concept of SSL, which first emerged in the fields of computer vision and natural language processing, to present a timely and comprehensive review of the existing SSL techniques for graph data. Specifically, we divide existing graph SSL methods into three categories: contrastive, generative, and predictive. More importantly, unlike many other surveys that only provide a high-level description of published research, we present an additional mathematical summary of the existing works in a unified framework. Furthermore, to facilitate methodological development and empirical comparisons, we also summarize the commonly used datasets, evaluation metrics, downstream tasks, and open-source implementations of various algorithms. Finally, we discuss the technical challenges and potential future directions for improving graph self-supervised learning.
翻译:图表上的深入学习最近在许多任务上取得了显著的成功,而这种成功在很大程度上依赖于大量和经过仔细标记的数据。然而,精确的注释通常非常昂贵和费时。为了解决这个问题,自我监督的学习(SSL)正在成为一种新的范例,通过精心设计的借口任务获取信息知识,而不必依靠人工标签。在这项调查中,我们扩展了最初在计算机视觉和自然语言处理领域出现的SSL概念,以便及时和全面地审查现有的图表数据SL技术。具体地说,我们把现有的图形SSL方法分为三类:对比性、基因化和预测性。更重要的是,与其他许多只提供已出版研究高级描述的调查不同,我们在一个统一的框架内对现有的工作提出额外的数学摘要。此外,为了便利方法发展和经验比较,我们还总结了常用的数据集、评价指标、下游任务以及各种算法的公开来源实施。最后,我们讨论了技术挑战以及改进图表自我监督学习的潜在方向。