Graph Neural Networks (GNNs) have been studied through the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.
翻译:神经网络图(GNNs)已经通过表达力和概括化的透镜进行了研究。 但是,它们的最佳性能却不那么为人所熟知。 我们首先通过研究GNNs的梯度动态来分析GNN培训。 首先,我们分析线性GNs, 并证明尽管培训不协调,但我们在现实世界的图表上验证的轻度假设下保证以线性速度达到全球最低水平。 其次,我们研究哪些可能影响GNNs的培训速度。 我们的结果显示,通过跳过连接、更深度和/或良好的标签分布,GNNNs的培训会自动加速。 经验性结果证实,我们关于线性GNNs的理论结果与非线性GNNs的培训行为相一致。 我们的结果为GNNs成功提供了第一种理论支持,在优化方面可以跳过连接,并且建议具有跳过连接的深度GNNNs在实践中很有希望。