We consider the robust linear regression model $\boldsymbol{y} = X\beta^* + \boldsymbol{\eta}$, where an adversary oblivious to the design $X \in \mathbb{R}^{n \times d}$ may choose $\boldsymbol{\eta}$ to corrupt all but a (possibly vanishing) fraction of the observations $\boldsymbol{y}$ in an arbitrary way. Recent work [dLN+21, dNS21] has introduced efficient algorithms for consistent recovery of the parameter vector. These algorithms crucially rely on the design matrix being well-spread (a matrix is well-spread if its column span is far from any sparse vector). In this paper, we show that there exists a family of design matrices lacking well-spreadness such that consistent recovery of the parameter vector in the above robust linear regression model is information-theoretically impossible. We further investigate the average-case time complexity of certifying well-spreadness of random matrices. We show that it is possible to efficiently certify whether a given $n$-by-$d$ Gaussian matrix is well-spread if the number of observations is quadratic in the ambient dimension. We complement this result by showing rigorous evidence -- in the form of a lower bound against low-degree polynomials -- of the computational hardness of this same certification problem when the number of observations is $o(d^2)$.
翻译:我们认为强大的线性回归模型 $\ boldsymbol{y} = X\ beta { +\ boldsymbol_heta} $, 对手若忽略设计 $X\ in\ mathbb{R ⁇ n\ time d} 美元, 可能选择 $\ boldsymbol_eta} 来腐蚀一切,但(可能消失的) 以任意的方式观测 $\ boldsymol{y} 的一小部分( 可能恢复) 。 最近的工作 [ dLN+21, dNS21] 引入了持续恢复参数矢量的高效算法。 这些算法非常关键地依赖于设计矩阵的宽度( 如果其柱形宽度远远离任何稀薄的矢量, 矩阵就会很宽广 ) 。 在本文中, 一个设计矩阵的组合缺乏广度, 因此, 持续恢复以上强型线性回归模型的参数矢量是不可能的。 我们进一步调查验证随机基质矩阵的准确度观测的平均值。 我们显示, 美元 的基数是 的基数 。