Depth separation -- why a deeper network is more powerful than a shallower one -- has been a major problem in deep learning theory. Previous results often focus on representation power. For example, arXiv:1904.06984 constructed a function that is easy to approximate using a 3-layer network but not approximable by any 2-layer network. In this paper, we show that this separation is in fact algorithmic: one can learn the function constructed by arXiv:1904.06984 using an overparameterized network with polynomially many neurons efficiently. Our result relies on a new way of extending the mean-field limit to multilayer networks, and a decomposition of loss that factors out the error introduced by the discretization of infinite-width mean-field networks.
翻译:深度分离——为什么更深的网络比更浅的网络更强大——一直是深度学习理论中的一个主要问题。以前的研究结果通常关注表示能力。例如,arXiv:1904.06984构造了一个函数,可以使用3层网络轻松逼近,但任何2层网络都无法逼近。在本文中,我们展示了这种分离实际上是算法性的:可以使用多项式数量的神经元利用过度参数化的网络有效地学习由arXiv:1904.06984构造的函数。我们的结果基于一种将平均场极限扩展到多层网络的新方法,以及一种将无限宽的平均场网络的离散化误差因子分解的损失函数。