The empirical success of deep convolutional networks on tasks involving high-dimensional data such as images or audio suggests that they can efficiently approximate certain functions that are well-suited for such tasks. In this paper, we study this through the lens of kernel methods, by considering simple hierarchical kernels with two or three convolution and pooling layers, inspired by convolutional kernel networks. These achieve good empirical performance on standard vision datasets, while providing a simple enough description of the functional space to shed light on their inductive bias. We show that the RKHS consists of additive models of interaction terms between patches, and that its norm encourages structured spatial similarities between these terms through pooling layers. We then provide generalization bounds which illustrate how pooling yields improved sample complexity guarantees when the target function presents such regularities.
翻译:深层革命网络在涉及高维数据的任务(如图像或音频)上所取得的经验性成功表明,它们能够有效地接近某些适合此类任务的职能。在本文件中,我们通过内核方法的透镜研究这一问题,方法是在革命内核网络的启发下,考虑具有两三个交集层和集合层的简单等级内核。这些网络在标准视觉数据集方面取得了良好的经验性表现,同时提供了功能空间的简单、足够描述,以揭示其感应偏差。我们表明,RKHS由补丁之间互动条件的添加模型组成,其规范鼓励通过集合层在这些术语之间形成结构上的空间相似性。我们随后提供了一般化的界限,以说明集聚产能如何在目标功能显示这种规律时提高样本复杂性的保障。