培训线性神经网络中隐含偏见的统一观点 (A Unifying View on Implicit Bias in Training Linear Neural Networks)

from arxiv, 38 pages, 7 figures. Revision after ICLR 2021 camera-ready version. Figure 2 newly added, theorem statements revised, including correction of Theorem 2

We study the implicit bias of gradient flow (i.e., gradient descent with infinitesimal step size) on linear neural network training. We propose a tensor formulation of neural networks that includes fully-connected, diagonal, and convolutional networks as special cases, and investigate the linear version of the formulation called linear tensor networks. With this formulation, we can characterize the convergence direction of the network parameters as singular vectors of a tensor defined by the network. For $L$-layer linear tensor networks that are orthogonally decomposable, we show that gradient flow on separable classification finds a stationary point of the $\ell_{2/L}$ max-margin problem in a "transformed" input space defined by the network. For underdetermined regression, we prove that gradient flow finds a global minimum which minimizes a norm-like function that interpolates between weighted $\ell_1$ and $\ell_2$ norms in the transformed input space. Our theorems subsume existing results in the literature while removing standard convergence assumptions. We also provide experiments that corroborate our analysis.

翻译：我们研究了线性神经网络培训中梯度流(即梯度下降,且脚步尺寸极小)的隐含偏差。我们建议作为特例,对包含完全连接、对角和进化网络的神经网络进行配方配方配方配方配方进行配方配方的线性版式研究。有了这种配方,我们可以将网络参数的趋同方向定性为由网络定义的单向矢量的单向矢量。对于正折向不易碎的 $-lea-lean 线性抗体网络,我们表明,分解分类中的梯度流在网络界定的“变形”输入空间中发现了一个固定点,即“变形”输入空间中发现一个最大负值。关于未定的回归,我们证明梯度流具有全球最低值,从而最大限度地减少在变换输入空间中加权 $1美元和 $ $ ell_ 美元和 $ $ 0. 2 美元的规范之间的一种规范。对于正值,我们的理论在文献中包含了现有结果,同时消除标准趋同假设。我们的分析也提供了实验。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

机器学习简明导论，62页pdf

专知会员服务

83+阅读 · 2021年7月31日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日