It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in training machine learning models. This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem, whose goal is to recover a low-rank ground-truth matrix from near-isotropic linear measurements. It is shown that GD with small initialization behaves similarly to the greedy low-rank learning heuristics (Li et al., 2020) and follows an incremental learning procedure (Gissin et al., 2019): GD sequentially learns solutions with increasing ranks until it recovers the ground truth matrix. Compared to existing works which only analyze the first learning phase for rank-1 solutions, our result provides characterizations for the whole learning process. Moreover, besides the over-parameterized regime that many prior works focused on, our analysis of the incremental learning procedure also applies to the under-parameterized regime. Finally, we conduct numerical experiments to confirm our theoretical findings.
翻译:据认为, " 梯子 " (GD)在培训机器学习模型中,暗含着对良好概括化的偏见。本文件对用于矩阵感测问题的GD动态进行了精细分析,目的是从近异谱线性测量中恢复低位地面真象矩阵。事实证明,小型初始化的GD行为与贪婪的低级学习超常主义相似(Li等人,2020年),并遵循一个渐进式学习程序(Gissin等人,2019年):GD在恢复地面真相矩阵之前,不断以不断递增的顺序学习解决方案。与仅分析一级解决方案第一个学习阶段的现有工作相比,我们的结果为整个学习进程提供了特征。此外,除了许多以往工作侧重于的过度量化制度外,我们对递增学习程序的分析也适用于低度测定制度。最后,我们进行数字实验,以证实我们的理论结论。