A common issue in graph learning under the semi-supervised setting is referred to as gradient scarcity. That is, learning graphs by minimizing a loss on a subset of nodes causes edges between unlabelled nodes that are far from labelled ones to receive zero gradients. The phenomenon was first described when optimizing the graph and the weights of a Graph Neural Network (GCN) with a joint optimization algorithm. In this work, we give a precise mathematical characterization of this phenomenon, and prove that it also emerges in bilevel optimization, where additional dependency exists between the parameters of the problem. While for GCNs gradient scarcity occurs due to their finite receptive field, we show that it also occurs with the Laplacian regularization model, in the sense that gradients amplitude decreases exponentially with distance to labelled nodes. To alleviate this issue, we study several solutions: we propose to resort to latent graph learning using a Graph-to-Graph model (G2G), graph regularization to impose a prior structure on the graph, or optimizing on a larger graph than the original one with a reduced diameter. Our experiments on synthetic and real datasets validate our analysis and prove the efficiency of the proposed solutions.
翻译:在半监督设置下,图学习中的一个常见问题是梯度稀缺现象。也就是说,通过对部分节点上的损失进行最小化来学习图会导致标记之间相距较远且没有标记的节点之间的边缘接收到零梯度。当优化图和图神经网络(Graph Neural Network,GCN)的权重时,这种现象被称为梯度稀缺现象。在这项工作中,我们对此现象进行了精确定义,并证明了它在双层优化中也会出现,其中问题的参数之间存在其他依赖关系。虽然对于GCNs,梯度稀缺现象是由于其有限的感受野而导致的,但我们表明这种现象也会在拉普拉斯正则化模型中出现,其梯度幅度随着到标记节点的距离指数级下降。为了缓解这个问题,我们研究了一些解决方案:我们建议使用Graph-to-Graph(G2G)模型进行潜在图学习,使用图正则化对图进行先验结构约束,或使用具有更小直径的较大图进行优化。我们在合成和真实数据集上进行的实验验证了我们的分析,并证明了所提出解决方案的有效性。