This work characterizes the effect of depth on the optimization landscape of linear regression, showing that, despite their nonconvexity, deeper models have more desirable optimization landscape. We consider a robust and over-parameterized setting, where a subset of measurements are grossly corrupted with noise and the true linear model is captured via an $N$-layer linear neural network. On the negative side, we show that this problem \textit{does not} have a benign landscape: given any $N\geq 1$, with constant probability, there exists a solution corresponding to the ground truth that is neither local nor global minimum. However, on the positive side, we prove that, for any $N$-layer model with $N\geq 2$, a simple sub-gradient method becomes oblivious to such ``problematic'' solutions; instead, it converges to a balanced solution that is not only close to the ground truth but also enjoys a flat local landscape, thereby eschewing the need for "early stopping". Lastly, we empirically verify that the desirable optimization landscape of deeper models extends to other robust learning tasks, including deep matrix recovery and deep ReLU networks with $\ell_1$-loss.
翻译:这项工作是深度对线性回归最佳景观的影响, 表明尽管深度模型不易调和, 更深的模型具有更理想的优化景观。 我们考虑一个强大和过度参数化的设置, 在那里, 一组测量结果被噪音严重腐蚀, 而真正的线性模型则通过一个 $n- lean 线性神经网络被捕获。 在负面的方面, 我们显示, 这个问题 \ textit{ does} 有着一个友好的景观: 任何$\ geq 1 美元, 并且经常有可能性, 都存在一个与地面真相相对应的解决方案, 既不是地方的,也不是全球最低的。 但是, 从积极的方面看, 我们证明, 对于任何使用$N\ geq 2 美元的任何一元的层模型, 一个简单的次梯度法 变得忽略了这种“ 问题性解决方案 ” ; 相反, 它与一个不仅接近地面真相, 而且还拥有平坦的当地景观, 从而需要“ 早期停止 ” 。 最后, 我们从经验上证实, 更深层的模型的最佳优化景观会延伸到其他坚固的网络, $_ exmexmexmexmexllevlexl