Outlier-robust additive matrix decomposition and robust matrix completion

from arxiv, This paper studies a broader model but shares content with arXiv:2012.06750 (which will not be further revised). Correction of typos. Unlike mentioned in arXiv:2012.06750, (2018) Bellec et DOES achieve the optimal rate for uncorrupted sparse linear regression (but assuming noise independent of features)

We study least-squares trace regression when the parameter is the sum of a $r$-low-rank and a $s$-sparse matrices and a fraction $\epsilon$ of the labels is corrupted. For subgaussian distributions, we highlight three design properties. The first, termed $\PP$, handles additive decomposition and follows from a product process inequality. The second, termed $\IP$, handles both label contamination and additive decomposition. It follows from Chevet's inequality. The third, termed $\MP$, handles the interaction between the design and featured-dependent noise. It follows from a multiplier process inequality. Jointly, these properties entail the near-optimality of a tractable estimator with respect to the effective dimensions $d_{\eff,r}$ and $d_{\eff,s}$ for the low-rank and sparse components, $\epsilon$ and the failure probability $\delta$. This rate has the form $$ \mathsf{r}(n,d_{\eff,r}) + \mathsf{r}(n,d_{\eff,s}) + \sqrt{(1+\log(1/\delta))/n} + \epsilon\log(1/\epsilon). $$ Here, $\mathsf{r}(n,d_{\eff,r})+\mathsf{r}(n,d_{\eff,s})$ is the optimal uncontaminated rate, independent of $\delta$. Our estimator is adaptive to $(s,r,\epsilon,\delta)$ and, for fixed absolute constant $c>0$, it attains the mentioned rate with probability $1-\delta$ uniformly over all $\delta\ge\exp(-cn)$. Disconsidering matrix decomposition, our analysis also entails optimal bounds for a robust estimator adapted to the noise variance. Finally, we consider robust matrix completion. We highlight a new property for this problem: one can robustly and optimally estimate the incomplete matrix regardless of the \emph{magnitude of the corruption}. Our estimators are based on ``sorted'' versions of Huber's loss. We present simulations matching the theory. In particular, it reveals the superiority of ``sorted'' Huber loss over the classical Huber's loss.

翻译：暂无翻译