We consider the least-squares regression problem with unknown noise variance, where the observed data points are allowed to be corrupted by outliers. Building on the median-of-means (MOM) method introduced by Lecue and Lerasle Ann.Statist.48(2):906-931(April 2020) in the case of known noise variance, we propose a general MOM approach for simultaneous inference of both the regression function and the noise variance, requiring only an upper bound on the noise level. Interestingly, this generalization requires care due to regularity issues that are intrinsic to the underlying convex-concave optimization problem. In the general case where the regression function belongs to a convex class, we show that our simultaneous estimator achieves with high probability the same convergence rates and a similar risk bound as if the noise level was unknown, as well as convergence rates for the estimated noise standard deviation. In the high-dimensional sparse linear setting, our estimator yields a robust analog of the square-root LASSO. Under weak moment conditions, it jointly achieves with high probability the minimax rates of estimation $s^{1/p} \sqrt{(1/n) \log(p/s)}$ for the $\ell_p$-norm of the coefficient vector, and the rate $\sqrt{(s/n) \log(p/s)}$ for the estimation of the noise standard deviation. Here $n$ denotes the sample size, $p$ the dimension and $s$ the sparsity level. We finally propose an extension to the case of unknown sparsity level $s$, providing a jointly adaptive estimator $(\widetilde \beta, \widetilde \sigma, \widetilde s)$. It simultaneously estimates the coefficient vector, the noise level and the sparsity level, with proven bounds on each of these three components that hold with high probability.
翻译:我们认为,在已知噪音差异的情况下,最低水平的回归问题与最低水平的噪音差异不相上下。有趣的是,这种一般化需要谨慎,因为经常性问题是内含的内含值(美元)和凝聚优化问题。基于Lecue和Lerasle Ann. Statistist.48(2):906-931(2020年4月)在已知噪音差异的情况下采用的中位值方法:906-931(2020年4月),在已知噪音差异的情况下,我们建议采用一般的MOM方法,同时推断回归功能和噪音差异,只需要在噪音水平上方设定一个上限。在高维度的线性下,我们的估计需要小心谨慎,因为经常性问题是基底基底的内含值(美元), 在一般情况下,回归功能属于Convex-convily 的中位值(MOMOM),我们同时估计的趋同值率和类似的风险与噪音水平相同,以及估计值水平的趋异值水平。在高度上,我们的估测测测度为美元/Salsal/sal的数值的数值水平上,以近值为美元。