Implicit regularization in deep learning is perceived as a tendency of gradient-based optimization to fit training data with predictors of minimal "complexity." The fact that only some types of data give rise to generalization is understood to result from them being especially amenable to fitting with low complexity predictors. A major challenge in formalizing this intuition is to define complexity measures that are quantitative yet capture the essence of data that admits generalization. With an eye towards this challenge, we provide the first analysis of implicit regularization in tensor factorization, equivalent to a certain non-linear neural network. We characterize the dynamics that gradient descent induces on the factorization, and establish a bias towards low tensor rank, in compliance with empirical evidence. Then, motivated by tensor rank capturing implicit regularization of a non-linear neural network, we empirically explore it as a measure of complexity, and find that it stays extremely low when fitting standard datasets. This leads us to believe that tensor rank may pave way to explaining both implicit regularization of neural networks, and the properties of real-world data translating this implicit regularization to generalization.
翻译:深层学习中隐含的正规化被视为一种趋势,即梯度优化,将培训数据与最低“复杂度”预测器相匹配。 事实上,只有某些类型的数据被理解为具有一般化倾向,因为某些类型的数据特别容易与低复杂度预测器相匹配。 将这种直觉正规化的一个主要挑战是界定具有定量的复杂度,但能捕捉接受一般化的数据的精髓。为了应对这一挑战,我们首次分析了在分量系数化中隐含的正规化,相当于某种非线性神经网络。我们根据经验证据,将梯度下降引发因素的动态描述为低发级,并形成对低发级的偏向。 之后,由强级获得非线性神经网络隐含的正规化的动力,我们实验性地把它作为一种复杂性的尺度加以探索,并发现当符合标准数据集时,它仍然极低。这使我们相信,十级的等级可能铺设解释神经网络隐含的正规化,以及将这种隐含的正规化转化为一般化的现实数据特性。