We study the problem of robustly estimating the mean of a $d$-dimensional distribution given $N$ examples, where most coordinates of every example may be missing and $\varepsilon N$ examples may be arbitrarily corrupted. Assuming each coordinate appears in a constant factor more than $\varepsilon N$ examples, we show algorithms that estimate the mean of the distribution with information-theoretically optimal dimension-independent error guarantees in nearly-linear time $\widetilde O(Nd)$. Our results extend recent work on computationally-efficient robust estimation to a more widely applicable incomplete-data setting.
翻译:我们研究如何以美元为例,严格估计美元-维分配的平均值,因为其中每个例子的多数坐标可能缺失,而美元-瓦列普西隆-纳特兰-纳特兰-纳特罗可能任意损坏。假设每个坐标出现在一个恒定系数中,超过美元-瓦列普西隆-纳特罗-纳特罗的示例中,我们用算法来估计以信息-理论上最优化的维度-独立误差保证在近线性时间(美元-全亚特尔德·奥(Nd)美元)的分布的平均值。我们的结果将最近关于计算效率强的稳健估算工作扩展到一个更广泛适用的不完整数据设置。