This paper studies active learning (AL) on graphs, whose purpose is to discover the most informative nodes to maximize the performance of graph neural networks (GNNs). Previously, most graph AL methods focus on learning node representations from a carefully selected labeled dataset with large amount of unlabeled data neglected. Motivated by the success of contrastive learning (CL), we propose a novel paradigm that seamlessly integrates graph AL with CL. While being able to leverage the power of abundant unlabeled data in a self-supervised manner, nodes selected by AL further provide semantic information that can better guide representation learning. Besides, previous work measures the informativeness of nodes without considering the neighborhood propagation scheme of GNNs, so that noisy nodes may be selected. We argue that due to the smoothing nature of GNNs, the central nodes from homophilous subgraphs should benefit the model training most. To this end, we present a minimax selection scheme that explicitly harnesses neighborhood information and discover homophilous subgraphs to facilitate active selection. Comprehensive, confounding-free experiments on five public datasets demonstrate the superiority of our method over state-of-the-arts.
翻译:本文研究图表上的积极学习(AL), 目的是发现最丰富的信息节点, 以最大限度地发挥图形神经网络(GNNSs)的性能。 以往, 多数 AL 图表的方法侧重于从精心选择的标签数据集中学习节点表示, 大量未贴标签的数据被忽视。 受对比性学习的成功激励, 我们提出了一个新颖的范例, 将图AL 和 CL 进行无缝的整合。 在能够以自我监督的方式利用大量无标签数据的力量的同时, AL 所选择的节点进一步提供语学信息, 可以更好地指导代表性学习。 此外, 先前的工作测量节点的信息性, 不考虑 GNNs 的周边传播计划, 从而可以选择噪音节点 。 我们说, 由于GNN( CL) 的平滑性质, 共性子图的中心节点应该最有利于示范培训 。 为此, 我们提出了一个微量的选择方案, 明确利用社区信息, 并发现同性子图解, 以方便积极选择 。 全面、 解析的五个公共数据配置方法的优越性实验 。