We study the robust mean estimation problem in high dimensions, where less than half of the datapoints can be arbitrarily corrupted. Motivated by compressive sensing, we formulate the robust mean estimation problem as the minimization of the $\ell_0$-`norm' of an \emph{outlier indicator vector}, under a second moment constraint on the datapoints. We further relax the $\ell_0$-`norm' to the $\ell_p$-norm ($0<p\leq 1$) in the objective and prove that the global minima for each of these objectives are order-optimal for the robust mean estimation problem. Then we propose a computationally tractable iterative $\ell_p$-minimization and hard thresholding algorithm that outputs an order-optimal robust estimate of the population mean. Both synthetic and real data experiments demonstrate that the proposed algorithm outperforms state-of-the-art robust mean estimation methods. The source code will be made available at GitHub.
翻译:我们研究高维的稳健平均值估算问题, 其中不到一半的数据点可能被任意腐蚀。 受压缩感的驱使, 我们将稳健的平均值估算问题表述为 \ emph{ 外部指示矢量 $_ 0$- norm 最小化和硬阈值算法, 在数据点的第二个时刻限制下 。 我们进一步将 $\ ell_ 0$- norm 放松到 $\ ell_ p$-norm (0< p\leq 1$ ) 的目标中, 并证明其中每一个目标的全球微型值都是稳健的平均值问题的最佳秩序。 然后我们提出一个可计算可移动的迭代值 $\ ell_ p$- 最小化和硬阈值算法, 以产生对人口值的有序- 最佳可靠估计值。 合成和真实的数据实验都表明, 提议的算法将超越最强的稳健的平均值估算方法。 源代码将在 GitHub 提供 。