We study high-dimensional robust statistics tasks in the streaming model. A recent line of work obtained computationally efficient algorithms for a range of high-dimensional robust estimation tasks. Unfortunately, all previous algorithms require storing the entire dataset, incurring memory at least quadratic in the dimension. In this work, we develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements (up to logarithmic factors). Our main result is for the task of high-dimensional robust mean estimation in (a strengthening of) Huber's contamination model. We give an efficient single-pass streaming algorithm for this task with near-optimal error guarantees and space complexity nearly-linear in the dimension. As a corollary, we obtain streaming algorithms with near-optimal space complexity for several more complex tasks, including robust covariance estimation, robust regression, and more generally robust stochastic optimization.
翻译:我们研究流模式中的高维强度统计任务。 最近一行工作为一系列高维强度估算任务获得了计算高效的计算算法。 不幸的是,所有先前的算法都需要存储整个数据集,至少在这个维度中包含内存。 在这项工作中,我们开发了第一个具有近最佳内存要求(顶多对数因素)的高维强统计高效流算法。我们的主要结果就是在(加强)Huber的污染模型中进行高维强度平均估算。我们用近最佳误差保证和空间复杂性几乎线性来为这一任务提供高效的单流算法。作为必然的结果,我们获得了具有近最佳空间复杂性的流算法,用于几项更为复杂的任务,包括稳健的常变估计、稳健的回归以及更普遍的稳健健的随机优化。