We study the problem of robustly estimating the mean of a $d$-dimensional distribution given $N$ examples, where $\varepsilon N$ examples may be arbitrarily corrupted and most coordinates of every example may be missing. Assuming each coordinate appears in a constant factor more than $\varepsilon N$ examples, we show algorithms that estimate the mean of the distribution with information-theoretically optimal dimension-independent error guarantees in nearly-linear time $\widetilde O(Nd)$. Our results extend recent work on computationally-efficient robust estimation to a more widely applicable incomplete-data setting.
翻译:我们研究如何以美元为例,对美元瓦列普西隆新元可能被任意腐蚀,每个新元可能缺少大多数坐标。假设每个坐标出现在一个恒定系数中,超过美元瓦列普西隆新元的例子,我们就会研究对美元瓦列普西隆新元可能任意腐败,而美元瓦列普西隆新元可能缺乏大多数坐标的美元实例进行严格估计的问题。我们用算法来估计,在几乎线性时间(美元全亚元(Nd)美元)中,以信息-理论上最佳的维度独立误差保证来估计分配平均值。我们的结果将最近关于计算效率强的稳健估算工作推广到更广泛适用的不完整数据设置。