We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance. While recent works have made important advances in the study of the optimization problem for overparametrized low-rank matrix approximation, much emphasis has been placed on discriminative settings and the square loss. In contrast, our model considers another interesting type of loss and connects with the generative setting. We characterize the critical points and minimizers of the Bures-Wasserstein distance over the space of rank-bounded matrices. For low-rank matrices the Hessian of this loss can theoretically blow up, which creates challenges to analyze convergence of optimizaton methods. We establish convergence results for gradient flow using a smooth perturbative version of the loss and convergence results for finite step size gradient descent under certain assumptions on the initial weights.
翻译:我们考虑的是用布雷斯-瓦塞尔斯坦距离培训的共变矩阵深层矩阵化模型。虽然最近的工作在研究过度平衡的低位矩阵近似化的最佳化问题方面取得了重要进展,但相当重视歧视环境和平方损失。相反,我们的模式考虑了另一个有趣的损失类型,并与基因环境相连接。我们描述布雷斯-瓦瑟斯坦距离在排位矩阵空间的临界点和最小化点。对于低位矩阵来说,这一损失的赫西安人理论上可以爆炸,这给分析优化方法的趋同性带来了挑战。我们利用一个平稳的周期,在初始重量的某些假设下,利用固定级梯度梯度梯度下降和趋同结果,为梯度流动确定趋同结果。</s>