This paper analyzes $\ell_1$ regularized linear regression under the challenging scenario of having only adversarially corrupted data for training. We use the primal-dual witness paradigm to provide provable performance guarantees for the support of the estimated regression parameter vector to match the actual parameter. Our theoretical analysis shows the counter-intuitive result that an adversary can influence sample complexity by corrupting the irrelevant features, i.e., those corresponding to zero coefficients of the regression parameter vector, which, consequently, do not affect the dependent variable. As any adversarially robust algorithm has its limitations, our theoretical analysis identifies the regimes under which the learning algorithm and adversary can dominate over each other. It helps us to analyze these fundamental limits and address critical scientific questions of which parameters (like mutual incoherence, the maximum and minimum eigenvalue of the covariance matrix, and the budget of adversarial perturbation) play a role in the high or low probability of success of the LASSO algorithm. Also, the derived sample complexity is logarithmic with respect to the size of the regression parameter vector, and our theoretical claims are validated by empirical analysis on synthetic and real-world datasets.
翻译:本文在具有挑战性的情况下分析了美元/ ell_ 1美元正常的线性回归。 我们使用原始双证人模式为支持估计回归参数矢量以符合实际参数提供可证实的性能保障。 我们的理论分析显示反直觉结果,即对手可以通过腐蚀不相干特性,即与回归参数矢量零系数相对应的特性,影响样本复杂性,因此不会影响依赖变量。由于任何对抗性强的算法都有其局限性,我们的理论分析确定了学习算法和对手可相互支配的制度。它帮助我们分析这些基本限度,并解决关键科学问题,其中哪些参数(如相互不一致性、共变矩阵的最大和最小值,以及对抗性扰动预算)在LASSO算法成功率高或低概率方面起着作用。此外,衍生的样本复杂性与回归参数矢量的大小有关,而且我们的理论主张得到合成和真实数据分析的验证。