We prove limitations on what neural networks trained by noisy gradient descent (GD) can efficiently learn. Our results apply whenever GD training is equivariant, which holds for many standard architectures and initializations. As applications, (i) we characterize the functions that fully-connected networks can weak-learn on the binary hypercube and unit sphere, demonstrating that depth-2 is as powerful as any other depth for this task; (ii) we extend the merged-staircase necessity result for learning with latent low-dimensional structure [ABM22] to beyond the mean-field regime. Under cryptographic assumptions, we also show hardness results for learning with fully-connected networks trained by stochastic gradient descent (SGD).
翻译:我们证明,通过噪音梯度下降(GD)所训练的神经网络能够有效学习的局限性。当GD培训具有等同性时,我们的结果就适用,因为许多标准架构和初始化都存在这种差异。 应用程序,(一) 我们把完全连接的网络在二进制超立方体和单体范围内的功能描述为薄弱,表明深度-2与这项任务的任何其他深度一样强大;(二) 我们把与潜伏低维结构[ABM22]学习的合并空间必要性结果推广到中下层系统之外。 在加密假设下,我们还展示了与由随机梯度梯度下降(SGD)所训练的完全连接的网络学习的硬性结果。