Graph instance contrastive learning has been proved as an effective task for Graph Neural Network (GNN) pre-training. However, one key issue may seriously impede the representative power in existing works: Positive instances created by current methods often miss crucial information of graphs or even yield illegal instances (such as non-chemically-aware graphs in molecular generation). To remedy this issue, we propose to select positive graph instances directly from existing graphs in the training set, which ultimately maintains the legality and similarity to the target graphs. Our selection is based on certain domain-specific pair-wise similarity measurements as well as sampling from a hierarchical graph encoding similarity relations among graphs. Besides, we develop an adaptive node-level pre-training method to dynamically mask nodes to distribute them evenly in the graph. We conduct extensive experiments on $13$ graph classification and node classification benchmark datasets from various domains. The results demonstrate that the GNN models pre-trained by our strategies can outperform those trained-from-scratch models as well as the variants obtained by existing methods.
翻译:对比图形学习已被证明是图形神经网络(GNN)培训前的一项有效任务。然而,一个关键问题可能严重妨碍现有工程的代表性:当前方法创造的正面实例往往缺少图表的关键信息,甚至产生非法案例(如分子生成中的非化学觉察图 ) 。为了解决这个问题,我们提议直接从培训数据集中的现有图表中选择正图实例,最终保持与目标图的合法性和相似性。我们的选择依据是某些特定域的相近性测量,以及从图表之间的等级图表编码相似性关系中取样。此外,我们还开发了适应性节点前训练方法,以动态方式掩蔽图中的各个节点,以均衡地分布在图中。我们进行了有关13美元的图表分类和不同领域的节点分类基准数据集的广泛实验。结果显示,我们战略预先培训的GNN模型能够超越经过训练的来自Scratch的模型以及现有方法获得的变式。