Despite their success, understanding how convolutional neural networks (CNNs) can efficiently learn high-dimensional functions remains a fundamental challenge. A popular belief is that these models harness the compositional and hierarchical structure of natural data such as images. Yet, we lack a quantitative understanding of how such structure affects performances, e.g. the rate of decay of the generalisation error with the number of training samples. In this paper we study deep CNNs in the kernel regime: i) we show that the spectrum of the corresponding kernel and its asymptotics inherit the hierarchical structure of the network; ii) we use generalisation bounds to prove that deep CNNs adapt to the spatial scale of the target function; iii) we illustrate this result by computing the rate of decay of the error in a teacher-student setting, where a deep CNN is trained on the output of another deep CNN with randomly-initialised parameters. We find that if the teacher function depends on certain low-dimensional subsets of the input variables, then the rate is controlled by the effective dimensionality of these subsets. Conversely, if the teacher function depends on the full set of input variables, then the error rate is inversely proportional to the input dimension. Interestingly, this implies that despite their hierarchical structure, the functions generated by deep CNNs are too rich to be efficiently learnable in high dimension.
翻译:尽管取得了成功,但理解进化神经网络(CNNs)如何有效学习高维功能仍是一个根本性挑战。流行的信念是,这些模型利用自然数据(如图像)的构成和等级结构。然而,我们对于这种结构如何影响性能缺乏定量的理解,例如,在培训样本数量方面,一般错误的衰减率与培训样本数量相比。在本文中,我们研究内核系统中的深CNN:i)我们显示,相应的内核及其无症状的频谱继承了网络的等级结构;ii)我们使用一般化界限来证明深CNN能够适应目标功能的空间尺度;iii)我们通过在教师-学生环境中计算错误的衰减率来说明这一结果,在教师-学生环境中,深CNN接受另一个深CNN的输出的训练,并随机地设定了初始参数。我们发现,如果教师的功能取决于某些低维的输入变量子集,那么该比率则由这些子集的有效维度加以控制。相反,如果教师的深CNNM功能取决于其深度的深度变量的高度,尽管其生成是完全的高度,但其深度的层次的层次的变量却意味着其输入的高度结构是完全的。