The success of deep convolutional networks on on tasks involving high-dimensional data such as images or audio suggests that they are able to efficiently approximate certain classes of functions that are not cursed by dimensionality. In this paper, we study this theoretically and empirically through the lens of kernel methods, by considering multi-layer convolutional kernels, which have achieved good empirical performance on standard vision datasets, and provide theoretical descriptions of over-parameterized convolutional networks in certain regimes. We find that while expressive kernels operating on input patches are important at the first layer, simpler polynomial kernels can suffice in higher layers for good performance. For such simplified models, we provide a precise functional description of the RKHS and its regularization properties, highlighting the role of depth for capturing interactions between different parts of the input signal, and the role of pooling for encouraging smooth dependence on the global or relative positions of such parts.
翻译:在涉及高维数据的任务(如图像或音频)上,深刻的革命网络的成功表明,它们能够有效地接近某些不受维度诅咒的功能类别。在本文件中,我们通过内核方法的透镜从理论上和经验上研究这一功能类别,方法是考虑多层革命内核,这些内核在标准视觉数据集上取得了良好的实证性能,并对某些体制中过于单向的共振网络提供理论性描述。我们发现,虽然在输入补丁上层操作的表达式内核很重要,但在更高层中更简单的多面内核足以产生良好的性能。对于这些简化模型,我们提供了对RKHS及其正规特性的精确功能描述,强调深度作用,以捕捉输入信号不同部分之间的相互作用,以及汇集作用,鼓励顺利依赖这些部分的全球或相对位置。