Graph-level representations are critical in various real-world applications, such as predicting the properties of molecules. But in practice, precise graph annotations are generally very expensive and time-consuming. To address this issue, graph contrastive learning constructs instance discrimination task which pulls together positive pairs (augmentation pairs of the same graph) and pushes away negative pairs (augmentation pairs of different graphs) for unsupervised representation learning. However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i.e., the negatives likely having the same semantic structure with the query, leading to performance degradation. To mitigate this sampling bias issue, in this paper, we propose a Prototypical Graph Contrastive Learning (PGCL) approach. Specifically, PGCL models the underlying semantic structure of the graph data via clustering semantically similar graphs into the same group, and simultaneously encourages the clustering consistency for different augmentations of the same graph. Then given a query, it performs negative sampling via drawing the graphs from those clusters that differ from the cluster of query, which ensures the semantic difference between query and its negative samples. Moreover, for a query, PGCL further reweights its negative samples based on the distance between their prototypes (cluster centroids) and the query prototype such that those negatives having moderate prototype distance enjoy relatively large weights. This reweighting strategy is proved to be more effective than uniform sampling. Experimental results on various graph benchmarks testify the advantages of our PGCL over state-of-the-art methods. Code is publicly available at https://github.com/ha-lins/PGCL.
翻译:在各种真实世界应用中,如预测分子的特性等,图形层次的表示方式至关重要。 但在实践中,精确的图表说明通常非常昂贵,而且耗费时间。 要解决这个问题,图形对比学习构建了实例差别化任务, 将正对组合( 同一图形的增强配对) 组合起来, 将负对( 不同图形的增强配对) 推走, 以便进行不受监督的表述学习。 但是, 由于查询, 其底部是从所有图表中统一抽样的, 现有方法受到关键抽样偏差问题的影响, 也就是说, 负数可能具有与查询相同的语义加权结构, 导致性能退化。 为了减轻这一抽样偏差问题, 我们在本论文中提出了一种模型对正对正对的对比性对比性( PGCL ) 方法。 具体地说, PGCL 模型的底部结构结构结构, 同时鼓励不同图表的组群集/ 的组合一致性。 之后, 以否定的数值取样为负数, 通过绘制PGG/ 有效计算结果, 其直径直径比, 。