We investigate robust linear regression where data may be contaminated by an oblivious adversary, i.e., an adversary than may know the data distribution but is otherwise oblivious to the realizations of the data samples. This model has been previously analyzed under strong assumptions. Concretely, $\textbf{(i)}$ all previous works assume that the covariance matrix of the features is positive definite; and $\textbf{(ii)}$ most of them assume that the features are centered (i.e. zero mean). Additionally, all previous works make additional restrictive assumption, e.g., assuming that the features are Gaussian or that the corruptions are symmetrically distributed. In this work we go beyond these assumptions and investigate robust regression under a more general set of assumptions: $\textbf{(i)}$ we allow the covariance matrix to be either positive definite or positive semi definite, $\textbf{(ii)}$ we do not necessarily assume that the features are centered, $\textbf{(iii)}$ we make no further assumption beyond boundedness (sub-Gaussianity) of features and measurement noise. Under these assumption we analyze a natural SGD variant for this problem and show that it enjoys a fast convergence rate when the covariance matrix is positive definite. In the positive semi definite case we show that there are two regimes: if the features are centered we can obtain a standard convergence rate; otherwise the adversary can cause any learner to fail arbitrarily.
翻译:我们调查了强健的线性回归,即数据可能受到一个隐蔽的对手,即可能知道数据分布的对手,但却忽略了数据样本的实现。这个模型以前曾根据强烈的假设分析过。具体地说,美元textbf{(i)}美元,所有以前的工作都假定特征的共变矩阵是肯定的;美元textbf{(ii)}美元,其中多数人假定特征是中间的(即平均值为零)。此外,所有以前的工作都作出额外的限制性假设,例如假设特征是高斯或腐败是不对称的。在这项工作中,我们超越这些假设并调查了稳健的回归:美元textbf{(i)}美元,我们让常态矩阵要么确定,要么半确定,美元textbf{(ii)}美元。此外,我们不一定假设这些特征是中间的, 美元textlebf{(iii)$我们没有进一步作出超出这些假设的正统性分布。我们没有进一步作出超出这些假设的正统性假设,而正统的正统的正统度则表明,正统的正统的正统的正统度是正统的正统的正统性的正统的正统度是正统的正统的正统的正比。