The problem of robust mean estimation in high dimensions is studied, in which a certain fraction (less than half) of the datapoints can be arbitrarily corrupted. Motivated by compressive sensing, the robust mean estimation problem is formulated as the minimization of the $\ell_0$-`norm' of an \emph{outlier indicator vector}, under a second moment constraint on the datapoints. The $\ell_0$-`norm' is then relaxed to the $\ell_p$-norm ($0<p\leq 1$) in the objective, and it is shown that the global minima for each of these objectives are order-optimal and have optimal breakdown point for the robust mean estimation problem. Furthermore, a computationally tractable iterative $\ell_p$-minimization and hard thresholding algorithm is proposed that outputs an order-optimal robust estimate of the population mean. The proposed algorithm (with breakdown point $\approx 0.3$) does not require prior knowledge of the fraction of outliers, in contrast with most existing algorithms, and for $p=1$ it has near-linear time complexity. Both synthetic and real data experiments demonstrate that the proposed algorithm outperforms state-of-the-art robust mean estimation methods.
翻译:对高维度的稳健平均值估算问题进行了研究,研究中可能会任意腐蚀数据点的某一部分(不到一半)数据点。在压缩的感动下,强势平均估算问题被表述为在数据点的第二个限制下将一个\ ell_0$-`norm' 数据点的$ ell_0$-`norm' 问题。然后将美元-`norm' 放松到目标中的$@ell_p$-norm (0<p\leq 1$),并表明,这些目标中的每一个目标的全球微型估算都是最优化的,并且对强势平均估算问题具有最佳的分解点。此外,还提出一个可计算可移动的迭代 $\ ell_ p$- minimm- 最小化和硬阈值算法输出一个对人口平均值的定序- 优化估算值。拟议的算法(分解点$\ pallx0.3$)并不要求事先了解离值部分,与大多数现有的算法相反,而且对于正态的精确度估算方法来说,对于正态的精确度的合成算算法则显示,它所拟议的精确度的精确度的精确度数据。