Graph convolutional networks (GCNs) and their variants have achieved great success in dealing with graph-structured data. However, it is well known that deep GCNs suffer from the over-smoothing problem, where node representations tend to be indistinguishable as more layers are stacked up. The theoretical research to date on deep GCNs has focused primarily on expressive power rather than trainability, an optimization perspective. Compared to expressivity, trainability attempts to address a more fundamental question: given a sufficiently expressive space of models, can we successfully find a good solution by gradient descent-based optimizer? This work fills this gap by exploiting the Graph Neural Tangent Kernel (GNTK), which governs the optimization trajectory under gradient descent for wide GCNs. We formulate the asymptotic behaviors of GNTK in the large depth, which enables us to reveal the dropping trainability of wide and deep GCNs at an exponential rate in the optimization process. Additionally, we extend our theoretical framework to analyze residual connection-resemble techniques, which are found to be only able to mildly mitigate the exponential decay of trainability. To overcome the exponential decay problem more fundamentally, we propose Critical DropEdge, a connectivity-aware and graph-adaptive sampling method, inspired by our theoretical insights on trainability. Experimental evaluation consistently confirms using our proposed method can achieve better results compared to relevant counterparts with both infinite-width and finite-width.
翻译:然而,众所周知,深层GCN在处理图形结构化数据方面取得了巨大的成功。但众所周知,深层GCN在处理图形结构化数据方面取得了巨大的成功。 深层GCN在过度移动的问题中深受其害, 节点表示往往无法区分, 因为更多的层层堆积起来。 迄今为止对深层GCN的理论研究主要侧重于表达力而不是训练性, 优化观点。 与表达力相比, 可培训性尝试旨在解决一个更根本的问题: 鉴于模型具有足够清晰的空间, 我们能否通过梯度下潜优化来成功找到一个好的解决方案? 这项工作填补了这一差距, 利用了Neural Tangnel(GNTK)图, 该图说明了大层层层层下坡度下降的优化轨迹。 我们设计了深层GNTCN在深度上的无阻碍行为, 使我们能够以指数化速度显示广度和深层GCN的可训练性下降。 此外, 我们扩展了理论框架, 分析了基于梯度下降的连接技术, 发现只能从根本上降低我们指数化的方法。