Convolutional neural networks perform a local and translationally-invariant treatment of the data: quantifying which of these two aspects is central to their success remains a challenge. We study this problem within a teacher-student framework for kernel regression, using `convolutional' kernels inspired by the neural tangent kernel of simple convolutional architectures of given filter size. Using heuristic methods from physics, we find in the ridgeless case that locality is key in determining the learning curve exponent $\beta$ (that relates the test error $\epsilon_t\sim P^{-\beta}$ to the size of the training set $P$), whereas translational invariance is not. In particular, if the filter size of the teacher $t$ is smaller than that of the student $s$, $\beta$ is a function of $s$ only and does not depend on the input dimension. We confirm our predictions on $\beta$ empirically. Theoretically, in some cases (including when teacher and student are equal) it can be shown that this prediction is an upper bound on performance. We conclude by proving, using a natural universality assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.
翻译:进化神经网络对数据进行本地和翻译变化式的处理:量化这两个方面中哪些方面是其成功的关键,这仍然是一个挑战。我们在教师-学生框架内研究这一问题,研究的是内核回归框架,我们使用的是“进化”内核,由简单进化结构中具有特定过滤尺寸的神经相干内核所启发。我们从物理的超脱法中发现,在无脊椎的案例中,地点是确定学习曲线外推美元的关键(与测试错误$\epsilon_t\sim P ⁇ \\\\\beta}有关,与培训设置的大小有关,P$$-\\\\\beta},而翻译性反差则不是。特别是,如果教师的过滤幅度小于学生的分数,$\beta$只是美元的一个函数,并不取决于投入层面。我们证实了对美元和美元的经验价值的预测。理论上说,在某些案例中(包括教师和学生的误差值),我们用一个自然级的曲线来判断,我们用一个自然级的曲线来证明,我们学习的曲线的曲线上的反向。