It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parameterizations: neural networks in the linear regime, and neural networks with no structural constraints. However, for the main parametrization of interest (non-linear but regular networks) no tight characterization has yet been achieved, despite significant developments. We take a step in this direction by considering depth-2 neural networks trained by SGD in the mean-field regime. We consider functions on binary inputs that depend on a latent low-dimensional subspace (i.e., small number of coordinates). This regime is of interest since it is poorly understood how neural networks routinely tackle high-dimensional datasets and adapt to latent low-dimensional structure without suffering from the curse of dimensionality. Accordingly, we study SGD-learnability with $O(d)$ sample complexity in a large ambient dimension $d$. Our main results characterize a hierarchical property, the "merged-staircase property", that is both necessary and nearly sufficient for learning in this setting. We further show that non-linear training is necessary: for this class of functions, linear methods on any feature map (e.g., the NTK) are not capable of learning efficiently. The key tools are a new "dimension-free" dynamics approximation result that applies to functions defined on a latent space of low-dimension, a proof of global convergence based on polynomial identity testing, and an improvement of lower bounds against linear methods for non-almost orthogonal functions.
翻译:目前已知的是如何确定神经网络与SGD可以学习的两个极端边缘参数的功能:线性体系中的神经网络和没有结构性限制的神经网络。然而,尽管取得了重大进展,但主要利益平衡(非线性但常规网络)还没有实现严格的定性。我们通过考虑SGD在中场制度中培训的深度-2神经网络,朝这个方向迈出了一步。我们考虑的是依赖于潜伏低维次空间(即,小点坐标)的二进制输入功能。这个制度很有意义,因为人们不太了解神经网络如何经常处理高维度数据集和适应潜伏低维度结构,而没有受到维度诅咒的影响。因此,我们研究SGD-learn可使用$O(d)$(d)在大环境维度中的样本复杂度。我们的主要结果是等级属性,即“软性”特性,即最接近于此环境中的学习。我们进一步表明,在不直线性动态上进行非线性测试是非直线性测试的一个功能。K级测试的一种关键的直线性工具是“直线性测试的一种功能,该级工具的一种定义的。