Recent empirical work has shown that hierarchical convolutional kernels inspired by convolutional neural networks (CNNs) significantly improve the performance of kernel methods in image classification tasks. A widely accepted explanation for the success of these architectures is that they encode hypothesis classes that are suitable for natural images. However, understanding the precise interplay between approximation and generalization in convolutional architectures remains a challenge. In this paper, we consider the stylized setting of covariates (image pixels) uniformly distributed on the hypercube, and fully characterize the RKHS of kernels composed of single layers of convolution, pooling, and downsampling operations. We then study the gain in sample efficiency of kernel methods using these kernels over standard inner-product kernels. In particular, we show that 1) the convolution layer breaks the curse of dimensionality by restricting the RKHS to `local' functions; 2) local pooling biases learning towards low-frequency functions, which are stable by small translations; 3) downsampling may modify the high-frequency eigenspaces but leaves the low-frequency part approximately unchanged. Notably, our results quantify how choosing an architecture adapted to the target function leads to a large improvement in the sample complexity.
翻译:最近的实证工作表明,受卷发神经网络(CNNs)启发的等级级共振内核内核显著改善了内核在图像分类任务中的性能。对于这些结构的成功,人们广泛接受的解释是,这些结构将适合自然图像的假设类编码成。然而,理解共生结构近似和一般化之间确切的相互作用仍然是一个挑战。在本文件中,我们认为共振神经网络(模拟像素)统一分布在超立方体上,并充分描述由单层共振、集合和下层取样操作组成的内核内核内核内核的RKHS内核的特性。然后我们研究利用这些内核内核对适合自然图像的假设类的样本效率。我们特别表明:(1) 共振层通过将RKHS限制为“本地”功能打破了对维度的诅咒;(2) 本地联合偏差学习低频功能,这种功能通过小翻译稳定下来;(3) 下层取样可能改变高频内核空间的取样效率,但使我们的低频变变的功能变成一个高数值部分。