A number of machine learning tasks entail a high degree of invariance: the data distribution does not change if we act on the data with a certain group of transformations. For instance, labels of images are invariant under translations of the images. Certain neural network architectures -- for instance, convolutional networks -- are believed to owe their success to the fact that they exploit such invariance properties. With the objective of quantifying the gain achieved by invariant architectures, we introduce two classes of models: invariant random features and invariant kernel methods. The latter includes, as a special case, the neural tangent kernel for convolutional networks with global average pooling. We consider uniform covariates distributions on the sphere and hypercube and a general invariant target function. We characterize the test error of invariant methods in a high-dimensional regime in which the sample size and number of hidden units scale as polynomials in the dimension, for a class of groups that we call `degeneracy $\alpha$', with $\alpha \leq 1$. We show that exploiting invariance in the architecture saves a $d^\alpha$ factor ($d$ stands for the dimension) in sample size and number of hidden units to achieve the same test error as for unstructured architectures. Finally, we show that output symmetrization of an unstructured kernel estimator does not give a significant statistical improvement; on the other hand, data augmentation with an unstructured kernel estimator is equivalent to an invariant kernel estimator and enjoys the same improvement in statistical efficiency.
翻译:一些机器学习任务需要高度的变异性:如果用某组变换来对数据采取行动,则数据分布不会改变。例如,图像标签在图像翻译中是无变的。某些神经网络结构 -- -- 例如共变网络 -- -- 被认为由于它们利用了这种变换性属性而取得成功。为了量化异变结构所实现的增益,我们引入了两类模型:无变随机特性和变换性内核方法。后者包括,作为特殊案例,全球平均集合的变动统计网络的神经冷凝内核内核是无变异的。我们考虑的是球和超立方和一般变异目标功能上的统一共变异性分布。我们把变异性方法的测试错误描述在高维度系统中,其中的样本大小和隐藏单位规模作为多数值,对于我们称之为“变异性值 美元” 的类别,后者包括以美元计的变异性对等值的统计网络的变异性内螺旋内螺旋内核内值内等值。我们用一个数字来显示一个数字的数值结构的变数。我们用在数字中,在数字的变数结构中的变异性结构中显示一个数字。