In non-convex settings, it is established that the behavior of gradient-based algorithms is different in the vicinity of local structures of the objective function such as strict and non-strict saddle points, local and global minima and maxima. It is therefore crucial to describe the landscape of non-convex problems. That is, to describe as well as possible the set of points of each of the above categories. In this work, we study the landscape of the empirical risk associated with deep linear neural networks and the square loss. It is known that, under weak assumptions, this objective function has no spurious local minima and no local maxima. We go a step further and characterize, among all critical points, which are global minimizers, strict saddle points, and non-strict saddle points. We enumerate all the associated critical values. The characterization is simple, involves conditions on the ranks of partial matrix products, and sheds some light on global convergence or implicit regularization that have been proved or observed when optimizing a linear neural network. In passing, we also provide an explicit parameterization of the set of all global minimizers and exhibit large sets of strict and non-strict saddle points.
翻译:在非convex 设置中,确定基于梯度的算法行为在目标功能的当地结构附近,例如严格和非严格的马鞍点、地方和全球小型和小型及大型马车等附近是不同的,因此,描述非康韦克斯问题的背景至关重要。也就是说,描述和可能地描述上述每一类的一组点。在这项工作中,我们研究了与深线性神经网络和平方损失有关的实证风险的场景。众所周知,在虚弱的假设下,这一目标功能没有虚假的当地小型和本地最高标准。我们进一步确定所有关键点,包括全球最小化器、严格马鞍点和非严格马车点。我们罗列所有相关的关键值。特征简单,涉及部分矩阵产品系列的条件,并在一定程度上说明在优化线性神经网络时已经证明或观察到的全球趋同或隐含的规范化。顺便说,我们还对所有全球最小化器的组合作了明确的参数化,并展示了大量的严格和不严格的固定点。