Graph convolutional networks (GCNs) and their variants have achieved great success in dealing with graph-structured data. Nevertheless, it is well known that deep GCNs suffer from the over-smoothing problem, where node representations tend to be indistinguishable as more layers are stacked up. The theoretical research to date on deep GCNs has focused primarily on expressive power rather than trainability, an optimization perspective. Compared to expressivity, trainability attempts to address a more fundamental question: Given a sufficiently expressive space of models, can we successfully find a good solution via gradient descent-based optimizers? This work fills this gap by exploiting the Graph Neural Tangent Kernel (GNTK), which governs the optimization trajectory under gradient descent for wide GCNs. We formulate the asymptotic behaviors of GNTK in the large depth, which enables us to reveal the dropping trainability of wide and deep GCNs at an exponential rate in the optimization process. Additionally, we extend our theoretical framework to analyze residual connection-based techniques, which are found to be merely able to mitigate the exponential decay of trainability mildly. Inspired by our theoretical insights on trainability, we propose Critical DropEdge, a connectivity-aware and graph-adaptive sampling method, to alleviate the exponential decay problem more fundamentally. Experimental evaluation consistently confirms using our proposed method can achieve better results compared to relevant counterparts with both infinite-width and finite-width.
翻译:然而,众所周知,深层GCN在处理图形结构化数据方面取得了巨大的成功。尽管如此,深层GCN在过度移动的问题中深受其害,因为更多的层层堆积起来,节点表示往往无法区分。深层GCN的理论研究主要侧重于表达力而不是可训练性,优化视角。与表达力相比,为处理一个更根本的问题而尝试的可培训性:鉴于模型空间足够清晰,我们能否通过梯度基底下降优化器成功找到一个好的解决方案?这项工作填补了这一差距,利用了Neural Tangnel(GNTK)图,该图指导了大层层层的梯度下降轨迹。我们设计了深层GNTK的无阻碍行为,这使我们能够显示广度和深层GCN在优化进程中的可训练性下降速度。此外,我们扩展了理论框架,分析了基于连接的残余技术,发现这些技术只是能够比得更深层的低层内层内层内核(GNTNTK),我们用了更深层的可分析方法提出了更深层的精确的可训练性。我们更精确的可理解性、更精确的精确的可分析方法,我们可以通过更精确的可减轻的精确的精确的精确的可变。