Many supervised learning problems involve high-dimensional data such as images, text, or graphs. In order to make efficient use of data, it is often useful to leverage certain geometric priors in the problem at hand, such as invariance to translations, permutation subgroups, or stability to small deformations. We study the sample complexity of learning problems where the target function presents such invariance and stability properties, by considering spherical harmonic decompositions of such functions on the sphere. We provide non-parametric rates of convergence for kernel methods, and show improvements in sample complexity by a factor equal to the size of the group when using an invariant kernel over the group, compared to the corresponding non-invariant kernel. These improvements are valid when the sample size is large enough, with an asymptotic behavior that depends on spectral properties of the group. Finally, these gains are extended beyond invariance groups to also cover geometric stability to small deformations, modeled here as subsets (not necessarily subgroups) of permutations.
翻译:许多受监督的学习问题涉及高维数据,如图像、文本或图表等。为了有效地使用数据,往往有必要利用手头问题中某些几何前科,例如变异翻译、变异分组或小变形稳定性。我们研究目标功能具有这种变异性和稳定性特性的学习问题样本的复杂性,研究时要考虑这些功能在球体上的球体性协调分解。我们为内核方法提供非参数性趋同率,并显示在使用该组的异性内核时,与该组的对应非异性内核相比,样本复杂性以相当于该组大小的一个系数提高。这些改进在样本大小足够大时是有效的,其无症状行为取决于该组的光谱特性。最后,这些增益的范围扩大到除异性组以外,还包括了微变异的稳定性,以该组(不一定是分组)的变异形组为模型。