We propose a novel method to estimate the coefficients of linear regression when outputs and inputs are contaminated by malicious outliers. Our method consists of two-step: (i) Make appropriate weights $\left\{\hat{w}_i\right\}_{i=1}^n$ such that the weighted sample mean of regression covariates robustly estimates the population mean of the regression covariate, (ii) Process Huber regression using $\left\{\hat{w}_i\right\}_{i=1}^n$. When (a-1) the regression covariate is a sequence with i.i.d. random vectors drawn from sub-Gaussian distribution satisfying $L_4$-$L_2$ norm equivalence with unknown mean and known identity covariance and (a-2) the absolute moment of the random noise is finite, our method attains a convergence rate, which is information theoretically optimal up to constant factor about noise term. When (b-1) the regression covariate is a sequence with i.i.d. random vectors drawn from heavy tailed distribution satisfying $L_4$-$L_2$ norm equivalence with unknown mean and (b-2) the absolute moment of the random noise is finite, our method attains a convergence rate, which is information theoretically optimal up to constant factor.
翻译:当输出和输入受到恶意异常点污染时,我们建议一种新颖的方法来估计线性回归系数。我们的方法由两步组成:(一) 适当加权 $\left ⁇ hat{w ⁇ i\right ⁇ i=1 ⁇ n$,这样回归的加权样本平均值共变强地估计回归共变数的人口平均值,(二) 使用$left ⁇ hat{w ⁇ i\right ⁇ i=1 ⁇ n$的工艺枢纽回归值。当(a-1) 回归共变数是按i.d.d. 随机矢量排序,从亚加西地区分布中抽取的矢量符合$_4$-$L_2$标准等值,且具有未知的平均值和已知的身份共变异性。 (a-2) 随机噪声绝对时,我们的方法达到了一种趋同率,这是理论上最符合恒定因素的信息。当(b-1) 回归共变数是按i.d.d. 随机矢量分布达到$_4$L_2$的矢量矢量矢量,从重尾分配得出的随机矢量矢量矢量矢量矢量量,达到我们最不为最高的恒定的恒定时, 达到恒定的惯态的定点, 。