We prove computational limitations for learning with neural networks trained by noisy gradient descent (GD). Our result applies whenever GD training is equivariant (true for many standard architectures), and quantifies the alignment needed between architectures and data in order for GD to learn. As applications, (i) we characterize the functions that fully-connected networks can weak-learn on the binary hypercube and unit sphere, demonstrating that depth-2 is as powerful as any other depth for this task; (ii) we extend the merged-staircase necessity result for learning with latent low-dimensional structure [ABM22] to beyond the mean-field regime. Our techniques extend to stochastic gradient descent (SGD), for which we show nontrivial hardness results for learning with fully-connected networks, based on cryptographic assumptions.
翻译:我们证明,在通过噪音梯度梯度下降(GD)训练的神经网络中,我们学习的计算局限性。当GD培训(对于许多标准架构而言)不均匀时,我们的结果就适用,并量化了建筑和数据之间所需的一致性,以便GD学习。 作为应用,(一) 我们描述完全连接的网络在二进制超立方体和单位范围内的功能,表明深度-2与这项任务的任何其他深度一样强大;(二) 我们把与潜伏低维结构(ABM22)学习的合并楼层必要性结果扩大到了中位系统之外。 我们的技术扩展到了随机梯度梯度梯度下移(SGD),为此我们根据加密假设,展示了与完全连接的网络学习的非三进硬性结果。