Self-supervised learning (SSL) has been extensively explored in recent years. Particularly, generative SSL has seen emerging success in natural language processing and other AI fields, such as the wide adoption of BERT and GPT. Despite this, contrastive learning-which heavily relies on structural data augmentation and complicated training strategies-has been the dominant approach in graph SSL, while the progress of generative SSL on graphs, especially graph autoencoders (GAEs), has thus far not reached the potential as promised in other fields. In this paper, we identify and examine the issues that negatively impact the development of GAEs, including their reconstruction objective, training robustness, and error metric. We present a masked graph autoencoder GraphMAE that mitigates these issues for generative self-supervised graph pretraining. Instead of reconstructing graph structures, we propose to focus on feature reconstruction with both a masking strategy and scaled cosine error that benefit the robust training of GraphMAE. We conduct extensive experiments on 21 public datasets for three different graph learning tasks. The results manifest that GraphMAE-a simple graph autoencoder with careful designs-can consistently generate outperformance over both contrastive and generative state-of-the-art baselines. This study provides an understanding of graph autoencoders and demonstrates the potential of generative self-supervised pre-training on graphs.
翻译:近年来,对自我监督学习(SSL)进行了广泛的探索。特别是,在自然语言处理和其他AI领域(如广泛采用BERT和GPT)中,SSL在自然语言处理和其他AI领域(如广泛采用BERT和GPT)中取得了新的成功。尽管如此,大量依赖结构数据增强和复杂培训战略的对比学习在图形SSL中一直是主导方法,而在图形,特别是图形自动编码器(GAE)中,SLS的进展迄今尚未达到所承诺的潜能。在本文中,我们发现并研究了对GAE的发展,包括其重建目标、培训坚固性和误度衡量标准等产生消极影响的问题。我们展示了一个掩码式自动编码图解析器图解问题,以自我监督的图形培训前训练为主,我们提议重点在图形分析仪前的大规模误差方面进行特征重建。我们为三种不同的图形学习任务进行了21个公共数据集的广泛实验。我们展示了“GifMAE-a 简单图表”的自我分析式直观性模型和“直观”的直观模型研究,提供了一种细致的直观的模型基线设计。