Low-rank matrix estimation under heavy-tailed noise is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to deliver a statistically consistent estimator even under sub-Gaussian noise. In this paper, we introduce a novel Riemannian sub-gradient (RsGrad) algorithm which is not only computationally efficient with linear convergence but also is statistically optimal, be the noise Gaussian or heavy-tailed. Convergence theory is established for a general framework and specific applications to absolute loss, Huber loss, and quantile loss are investigated. Compared with existing non-convex methods, ours reveals a surprising phenomenon of dual-phase convergence. In phase one, RsGrad behaves as in a typical non-smooth optimization that requires gradually decaying stepsizes. However, phase one only delivers a statistically sub-optimal estimator which is already observed in the existing literature. Interestingly, during phase two, RsGrad converges linearly as if minimizing a smooth and strongly convex objective function and thus a constant stepsize suffices. Underlying the phase-two convergence is the smoothing effect of random noise to the non-smooth robust losses in an area close but not too close to the truth. Lastly, RsGrad is applicable for low-rank tensor estimation under heavy-tailed noise where a statistically optimal rate is attainable with the same phenomenon of dual-phase convergence, and a novel shrinkage-based second-order moment method is guaranteed to deliver a warm initialization. Numerical simulations confirm our theoretical discovery and showcase the superiority of RsGrad over prior methods.
翻译:在重尾噪音下低端矩阵估算在计算和统计两方面都具有挑战性。 精密方法在统计上被证明是最佳的,但计算成本却很高, 特别是因为稳健的损失功能通常不松动。 最近, 提议通过亚梯级下降进行快速计算的非康纳方法, 不幸的是, 即使是在亚高加索噪音下, 也未能提供统计上一致的估测器。 在本文中, 我们引入了一个新的里格曼尼亚级( Rs Grad) 算法, 它不仅计算效率高, 而且在统计上也不太理想, 也是高或重尾的噪音。 最近, 为总框架和对绝对损失的具体应用, 调调调理理论被建立起来。 与现有的非康那类方法相比, 我们的测算法也显示了一种两阶段趋同的趋同现象。 在第一阶段, Rsgrimann 表现为典型的不毛细的精度优化, 需要逐渐衰减。 然而, 第一阶段, 仅应用的阶段, 一个统计上的平稳的递归正级阶段, 是一个连续的递进阶段, 直级的递缩的递进的递的递进的递法, 是一个在现有的递进进进阶段, 直的递进的递进的递进的阶段, 直的顺序的机的递法, 向下, 向下, 向的递进进进进进进进进的递制的递制的递进的递进的机的递法, 。