We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set. For the special case of deep linear networks, we prove that each optimal weight matrix aligns with the previous layers via duality. More importantly, we apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds. As a corollary, we also prove that norm regularized deep ReLU networks yield spline interpolation for one-dimensional datasets which was previously known only for two-layer networks. Furthermore, we provide closed-form solutions for the optimal layer weights when data is rank-one or whitened. The same analysis also applies to architectures with batch normalization even for arbitrary data. Therefore, we obtain a complete explanation for a recent empirical observation termed Neural Collapse where class means collapse to the vertices of a simplex equiangular tight frame.
翻译:我们研究了常规化的深神经网络(DNNs),并引入了一个对隐藏层结构进行特征描述的配置分析框架。我们显示,规范化的常规化 DNN培训问题的一套最佳隐藏层加权值可以明确地作为二次曲线组的极端点。对于深线网络的特殊情况,我们证明,每个最佳加权矩阵都通过双重性与前一层一致。更重要的是,我们用白化数据对深ReLU网络进行同样的定性,并证明相同的重量对齐。作为必然结果,我们还证明规范化的深ReLU网络为单维数据集生成了螺旋内插图,而以前只有两层网络才知道这一点。此外,我们为数据为一级或白版时的最佳层加权提供了封闭式解决方案。同样的分析也适用于有批次正常化的架构,即使对任意数据也是如此。因此,我们对最近的实验性观测得出了完整的解释,称为Neural Colde, 即类折叠成一个简单的直角紧框的脊椎。