We propose a novel method to estimate the coefficients of linear regression when outputs and inputs are contaminated by malicious outliers. Our method consists of two-step: (i) Make appropriate weights $\left\{\hat{w}_i\right\}_{i=1}^n$ such that the weighted sample mean of regression covariates robustly estimates the population mean of the regression covariate, (ii) Process Huber regression using $\left\{\hat{w}_i\right\}_{i=1}^n$. When (a) the regression covariate is a sequence with i.i.d. random vectors drawn from sub-Gaussian distribution with unknown mean and known identity covariance and (b) the absolute moment of the random noise is finite, our method attains a faster convergence rate than Diakonikolas, Kong and Stewart (2019) and Cherapanamjeri et al. (2020). Furthermore, our result is minimax optimal up to constant factor. When (a) the regression covariate is a sequence with i.i.d. random vectors drawn from heavy tailed distribution with unknown mean and bounded kurtosis and (b) the absolute moment of the random noise is finite, our method attains a convergence rate, which is minimax optimal up to constant factor.
翻译:当输出和输入受到恶意异常点污染时,我们建议一种新颖的方法来估计线性回归系数。我们的方法由两步组成:(一) 适当加权 $\left ⁇ hat{w ⁇ i\right ⁇ i=1 ⁇ n$,这样回归的加权样本平均值就会有力地共变估计回归共变体的人口平均值,(二) 使用$left ⁇ hat{w ⁇ i\right ⁇ i=1 ⁇ n$的流程枢纽回归值。当(a) 回归共变数是一个序列,有i.d. 随机矢量序列,从亚库西地区分布的随机矢量具有未知的平均值和已知的身份变量,以及(b) 随机噪音的绝对时刻是有限的,我们的方法比Dikonikonikolas、Kong和Stewart (2019年) 和 Cherapamngeri 等人(202020年) 的加权回归率更快的趋同率。此外,我们的结果比恒定值最优。当(a) 回归共变数为i.d.d.d. 随机矢量矢量分布从重的尾分布从未知分布,我们最接近的固定的惯态的峰值是最硬的峰值。