We consider the high-dimensional linear regression model and assume that a fraction of the responses are contaminated by an adversary with complete knowledge of the data and the underlying distribution. We are interested in the situation when the dense additive noise can be heavy-tailed but the predictors have sub-Gaussian distribution. We establish minimax lower bounds that depend on the the fraction of the contaminated data and the tails of the additive noise. Moreover, we design a modification of the square root Slope estimator with several desirable features: (a) it is provably robust to adversarial contamination, with the performance guarantees that take the form of sub-Gaussian deviation inequalities and match the lower error bounds up to log-factors; (b) it is fully adaptive with respect to the unknown sparsity level and the variance of the noise, and (c) it is computationally tractable as a solution of a convex optimization problem. To analyze the performance of the proposed estimator, we prove several properties of matrices with sub-Gaussian rows that could be of independent interest.
翻译:我们考虑了高维线性回归模型,并假定部分反应受到完全了解数据和基本分布的对手的污染。我们感兴趣的是,密集添加性噪声可能变大,但预测器具有亚加西的分布;我们根据受污染数据的分数和添加性噪声的尾部,建立微缩轴下限;此外,我们设计了方根Slope测量器的修改,具有若干可取的特征:(a) 它对对抗性污染具有可辨称强力,其性能保障的形式是亚加西偏差不平等,与低误差的界限与日志偏差相匹配;(b) 它完全适应未知的宽度水平和噪音的差异,以及(c) 它可计算为锥体优化问题的解决方案。为了分析拟议的估测器的性能,我们证明子加西南线的矩阵具有几种特性,这些特性可能具有独立的兴趣。