We study the problem of robust mean estimation and introduce a novel Hamming distance-based measure of distribution shift for coordinate-level corruptions. We show that this measure yields adversary models that capture more realistic corruptions than those used in prior works, and present an information-theoretic analysis of robust mean estimation in these settings. We show that for structured distributions, methods that leverage the structure yield information theoretically more accurate mean estimation. We also focus on practical algorithms for robust mean estimation and study when data cleaning-inspired approaches that first fix corruptions in the input data and then perform robust mean estimation can match the information theoretic bounds of our analysis. We finally demonstrate experimentally that this two-step approach outperforms structure-agnostic robust estimation and provides accurate mean estimation even for high-magnitude corruption.
翻译:我们研究了强势平均估计问题,并引入了一种新的基于距离的测算方法,用于协调层面腐败的分配转移。我们表明,这一测算生成了比先前的测算方法更能捕捉实际腐败的对立模型,并对这些环境中的强势平均估计进行了信息理论分析。我们展示了结构化分布方法,利用结构生成信息的方法在理论上更准确的估测。我们还侧重于强势平均估测和研究实用算法,这些算法首先纠正输入数据中的腐败,然后进行稳健的中值估计,可以与我们分析的信息理论界限相匹配。我们最后实验地证明,这一两步方法在结构上优于不可知度的稳健估计,并且提供了准确的中值估算,即使对于高磁性腐败也是如此。