We study the asymmetric matrix factorization problem under a natural nonconvex formulation with arbitrary overparametrization. The model-free setting is considered, with minimal assumption on the rank or singular values of the observed matrix, where the global optima provably overfit. We show that vanilla gradient descent with small random initialization sequentially recovers the principal components of the observed matrix. Consequently, when equipped with proper early stopping, gradient descent produces the best low-rank approximation of the observed matrix without explicit regularization. We provide a sharp characterization of the relationship between the approximation error, iteration complexity, initialization size and stepsize. Our complexity bound is almost dimension-free and depends logarithmically on the approximation error, with significantly more lenient requirements on the stepsize and initialization compared to prior work. Our theoretical results provide accurate prediction for the behavior gradient descent, showing good agreement with numerical experiments.
翻译:我们研究的是自然非碳化物配方的不对称矩阵因子化问题,这种配方具有任意的过度平衡性。不使用模型的设置被考虑,对观测到的矩阵的等级或单值作出最低的假设,而全球的偏观性则可明显地过度适应。我们表明,小随机初始化的香草梯度下降依次恢复了所观察到的矩阵的主要组成部分。因此,在配备适当的早期停止时,梯度下降产生所观察到的矩阵的最佳低端近似值,而没有明确的正规化。我们对近似错误、迭代复杂性、初始化大小和阶梯化之间的关系作了清晰的描述。我们的复杂性几乎是无维的,对近似误差的逻辑性依赖逻辑性差,与以前的工作相比,关于步骤化和初始化的要求大大宽松。我们的理论结果为行为梯度下降提供了准确的预测,显示与数字实验的良好一致。