Graph Convolutional Networks (GCNs) are known to suffer from performance degradation as the number of layers increases, which is usually attributed to over-smoothing. Despite the apparent consensus, we observe that there exists a discrepancy between the theoretical understanding of over-smoothing and the practical capabilities of GCNs. Specifically, we argue that over-smoothing does not necessarily happen in practice, a deeper model is provably expressive, can converge to global optimum with linear convergence rate, and achieve very high training accuracy as long as properly trained. Despite being capable of achieving high training accuracy, empirical results show that the deeper models generalize poorly on the testing stage and existing theoretical understanding of such behavior remains elusive. To achieve better understanding, we carefully analyze the generalization capability of GCNs, and show that the training strategies to achieve high training accuracy significantly deteriorate the generalization capability of GCNs. Motivated by these findings, we propose a decoupled structure for GCNs that detaches weight matrices from feature propagation to preserve the expressive power and ensure good generalization performance. We conduct empirical evaluations on various synthetic and real-world datasets to validate the correctness of our theory.
翻译:尽管存在明显的共识,但我们注意到,对过度移动的理论理解与全球通信网络的实际能力之间存在着差距。具体地说,我们争辩说,过度移动不一定在实践中发生,一个更深层次的模型具有可辨别的表达性,可以与线性趋同率达到全球最佳水平,只要经过适当培训,就能达到很高的培训准确性。尽管能够实现高培训准确性,但实证结果显示,更深层次的模型在测试阶段一般化不甚完善,而且目前对这种行为的理论理解仍然难以实现。为了更好地理解,我们仔细分析全球通信网络的普及能力,并表明实现高培训准确性的培训战略大大削弱了全球通信网络的普遍化能力。受这些调查结果的驱动,我们提议为全球通信网络建立一个分解结构,使重指数与特征传播脱钩,以保持明确性能和确保良好的普遍化表现。我们为更好地理解,我们对各种合成和现实世界数据理论进行实证。