Network pruning or network sparsification has a long history and practical significance in modern applications. The loss surface of dense neural networks would yield a bad landscape due to non-convexity and non-linear activations, but over-parameterization may lead to benign geometrical properties. In this paper, we study sparse networks with the squared loss objective, showing that like dense networks, sparse networks can still preserve benign landscape when the last hidden layer width is larger than the number of training data. Our results have been built on general linear sparse networks, linear CNNs (a special class of sparse networks), and nonlinear sparse networks. We also present counterexamples when certain assumptions are violated, which implies that these assumptions are necessary for our results.
翻译:网络运行或网络封闭在现代应用中具有悠久的历史和实际意义。 密度稠密神经网络的流失表面会因非精密和非线性激活而造成糟糕的地貌,但过度的参数化可能会导致良性的几何特性。 在本文中,我们用平方损失目标研究稀少的网络,表明与稠密网络一样,在最后一层隐藏宽度大于培训数据数量时,稀少的网络仍然能够保护良性景观。 我们的结果建立在一般线性分散网络、线性CNN(少数网络的特殊类)和非线性稀少网络之上。 当某些假设被违反时,我们也提出反示例,这意味着这些假设对于我们的结果是必要的。