Deep learning has received considerable empirical successes in recent years. However, while many ad hoc tricks have been discovered by practitioners, until recently, there has been a lack of theoretical understanding for tricks invented in the deep learning literature. Known by practitioners that overparameterized neural networks are easy to learn, in the past few years there have been important theoretical developments in the analysis of overparameterized neural networks. In particular, it was shown that such systems behave like convex systems under various restricted settings, such as for two-layer NNs, and when learning is restricted locally in the so-called neural tangent kernel space around specialized initializations. This paper discusses some of these recent progresses leading to significant better understanding of neural networks. We will focus on the analysis of two-layer neural networks, and explain the key mathematical models, with their algorithmic implications. We will then discuss challenges in understanding deep neural networks and some current research directions.
翻译:近些年来,深层次的学习取得了相当的成功经验。然而,尽管实践者发现了许多临时的技巧,但直到最近,对深层次的学习文献所发明的把戏缺乏理论上的理解。实践者知道过度分解的神经网络很容易学习,在过去几年里,在分析过度分解的神经网络方面出现了重要的理论发展。特别是,这些系统表现得像在各种限制环境下的锥形系统,例如两层NNN,当学习在所谓的神经离心空间中以专门初始化为中心时受到限制时。本文讨论了最近取得的一些进展,这些进展导致对神经网络的深入了解。我们将着重分析两层神经网络,并解释关键的数学模型及其算法影响。然后我们将讨论在理解深层神经网络和一些当前研究方向方面的挑战。