Understanding the fundamental principles behind the success of deep neural networks is one of the most important open questions in the current literature. To this end, we study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape. We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases. We then show that pathwise regularized training problems can be represented as an exact convex optimization problem. We further prove that the equivalent convex problem is regularized via a group sparsity inducing norm. Thus, a path regularized parallel ReLU network can be viewed as a parsimonious convex model in high dimensions. More importantly, we show that the computational complexity required to globally optimize the equivalent convex problem is fully polynomial-time in feature dimension and number of samples. Therefore, we prove polynomial-time trainability of path regularized ReLU networks with global optimality guarantees. We also provide several numerical experiments corroborating our theory.
翻译:理解深神经网络成功背后的基本原则是当前文献中最重要的未决问题之一。 为此,我们研究深神经网络的培训问题,并引入分析方法,以揭示优化景观中隐藏的共性。我们考虑一个深度平行的RELU网络结构,其中也包括标准的深网络和ResNets,这是其特例。然后我们表明,路径上正常的培训问题可以被作为精确的锥形优化问题来表现。我们进一步证明,对等的锥形问题通过集体宽度诱导规范而正规化。因此,一条路径正规化的平行ReLU网络可以被视为高维度的相近共性模型。更重要的是,我们表明,全球优化等同的共性问题所需的计算复杂性在特性层面和样本数量上是完全多元时的。因此,我们证明路径正规化的ReLU网络具有全球最佳性保证。我们还提供了数项数字实验,以证实我们的理论。