Masked graph autoencoder (MGAE) has emerged as a promising self-supervised graph pre-training (SGP) paradigm due to its simplicity and effectiveness. However, existing efforts perform the mask-then-reconstruct operation in the raw data space as is done in computer vision (CV) and natural language processing (NLP) areas, while neglecting the important non-Euclidean property of graph data. As a result, the highly unstable local connection structures largely increase the uncertainty in inferring masked data and decrease the reliability of the exploited self-supervision signals, leading to inferior representations for downstream evaluations. To address this issue, we propose a novel SGP method termed Robust mAsked gRaph autoEncoder (RARE) to improve the certainty in inferring masked data and the reliability of the self-supervision mechanism by further masking and reconstructing node samples in the high-order latent feature space. Through both theoretical and empirical analyses, we have discovered that performing a joint mask-then-reconstruct strategy in both latent feature and raw data spaces could yield improved stability and performance. To this end, we elaborately design a masked latent feature completion scheme, which predicts latent features of masked nodes under the guidance of high-order sample correlations that are hard to be observed from the raw data perspective. Specifically, we first adopt a latent feature predictor to predict the masked latent features from the visible ones. Next, we encode the raw data of masked samples with a momentum graph encoder and subsequently employ the resulting representations to improve predicted results through latent feature matching. Extensive experiments on seventeen datasets have demonstrated the effectiveness and robustness of RARE against state-of-the-art (SOTA) competitors across three downstream tasks.
翻译:掩码图自编码器 (MGAE) 由于其简单性和有效性而成为一种有前途的自监督图预训练 (SGP) 范例。然而,现有的尝试将在计算机视觉 (CV) 和自然语言处理 (NLP) 领域中所做的掩码-重建操作直接在原始数据空间中执行,而忽略了图形数据的重要的非欧几里得性质。结果,高度不稳定的局部连接结构大大增加了推断掩码数据的不确定性,降低了所利用的自监督信号的可靠性,导致下游评估的副本表示。为了解决这个问题,我们提出了一种新的 SGP 方法,称为鲁棒的掩码图自编码器 (RARE),以通过在高阶潜在特征空间中进一步遮盖和重构节点样本来改善推断遮盖数据的确定性和自监督机制的可靠性。通过理论和实证分析,我们发现在潜在特征空间和原始数据空间中执行联合掩蔽然后重构策略可以产生改进的稳定性和性能。为此,我们精心设计了一个掩码潜在特征完成方案,该方案在高阶样本相关性的指导下预测掩码节点的潜在特征,这些相关性很难从原始数据角度观察到。具体而言,我们首先采用潜在特征预测器从可见的潜在特征预测掩盖的潜在特征。接下来,我们使用动量图形编码器对掩码样本的原始数据进行编码,并随后利用生成的表示来通过潜在特征匹配改善预测结果。对17个数据集的广泛实验证明了RARE针对三个下游任务中的SOTA竞争者的有效性和鲁棒性。