Robust estimation is much more challenging in high dimensions than it is in one dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility.
翻译:强力估计在高维方面比一个维度更具挑战性:大多数技术要么导致难以解决的优化问题,要么只能容忍一小部分错误的估算者。 计算机科学理论科学的近期工作表明,在适当的分配模型中,可以有力地估计能够容忍腐败的固定部分的多元时间算法的平均值和共性。 然而,这些算法的样本和时间复杂性对于高维应用来说是惊人的。 在这项工作中,我们通过建立最优化的样本复杂度,直至对数因素,以及提供各种改进,使算法能够容忍更多腐败。 最后,我们用合成和真实的数据来显示,我们的算法具有最先进的性能,并突然使高维度强的估算成为现实的可能性。