Matrix factorization is a simple and natural test-bed to investigate the implicit regularization of gradient descent. Gunasekar et al. (2017) conjectured that Gradient Flow with infinitesimal initialization converges to the solution that minimizes the nuclear norm, but a series of recent papers argued that the language of norm minimization is not sufficient to give a full characterization for the implicit regularization. In this work, we provide theoretical and empirical evidence that for depth-2 matrix factorization, gradient flow with infinitesimal initialization is mathematically equivalent to a simple heuristic rank minimization algorithm, Greedy Low-Rank Learning, under some reasonable assumptions. This generalizes the rank minimization view from previous works to a much broader setting and enables us to construct counter-examples to refute the conjecture from Gunasekar et al. (2017). We also extend the results to the case where depth $\ge 3$, and we show that the benefit of being deeper is that the above convergence has a much weaker dependence over initialization magnitude so that this rank minimization is more likely to take effect for initialization with practical scale.
翻译:Gunasekar 等人(2017年) 推测, 带无限微量初始化的“ 渐变” 与无限微量初始化相融合, 与尽量减少核规范的解决方案相融合, 但最近的一系列论文认为, 规范最小化的语言不足以充分描述隐性规范化的特征。 在这项工作中, 我们提供理论和经验证据, 用于深度 - 2 矩阵化的理论和经验证据, 具有无限微量初始化的梯度在数学上等同于简单的超常级最小化算法, 在某些合理的假设下, 贪婪 低兰克 学习 。 这将级最小化观点从先前的作品概括到一个大得多的环境, 使我们能够构建反比例, 反驳来自 Gunasekar 等人( 2017年) 的配置。 我们还将结果推广到深度 3 美元 和 更深一点的好处是, 以上趋同比初始化程度的依赖性要小得多, 因此这一等级最小化更有可能在实际规模上产生初始化效果 。