In this paper, we follow Eftekhari's work to give a non-local convergence analysis of deep linear networks. Specifically, we consider optimizing deep linear networks which have a layer with one neuron under quadratic loss. We describe the convergent point of trajectories with arbitrary starting point under gradient flow, including the paths which converge to one of the saddle points or the original point. We also show specific convergence rates of trajectories that converge to the global minimizer by stages. To achieve these results, this paper mainly extends the machinery in Eftekhari's work to provably identify the rank-stable set and the global minimizer convergent set. We also give specific examples to show the necessity of our definitions. Crucially, as far as we know, our results appear to be the first to give a non-local global analysis of linear neural networks from arbitrary initialized points, rather than the lazy training regime which has dominated the literature of neural networks, and restricted benign initialization in Eftekhari's work. We also note that extending our results to general linear networks without one hidden neuron assumption remains a challenging open problem.
翻译:在本文中,我们遵循Eftekhari的工作,对深线网络进行非局部的趋同分析。 具体地说, 我们考虑优化深度线性网络,这些网络有一个层,在四度损失下有一个神经元。 我们描述梯度流下具有任意起点的轨迹汇合点, 包括连接到一个马鞍点或最初点的路径。 我们还展示了不同阶段聚集到全球最小化点的轨迹的具体趋同率。 为了实现这些结果, 本文主要扩展了Eftekhari的机械工作, 以可辨别分级数据集和全球最小集。 我们还举了具体的例子来说明我们定义的必要性。 据我们所知, 我们的结果似乎是第一个从任意初始点对线性神经网络进行非局部的全球性分析, 而不是从主导神经网络文献的懒惰训练制度, 以及Eftekhari工作中的纯度初始化有限。 我们还指出, 将我们的结果扩大到一般线性网络而没有隐蔽的神经假设, 仍是一个挑战性的问题。