Robust statistics traditionally focuses on outliers, or perturbations in total variation distance. However, a dataset could be corrupted in many other ways, such as systematic measurement errors and missing covariates. We generalize the robust statistics approach to consider perturbations under any Wasserstein distance, and show that robust estimation is possible whenever a distribution's population statistics are robust under a certain family of friendly perturbations. This generalizes a property called resilience previously employed in the special case of mean estimation with outliers. We justify the generalized resilience property by showing that it holds under moment or hypercontractive conditions. Even in the total variation case, these subsume conditions in the literature for mean estimation, regression, and covariance estimation; the resulting analysis simplifies and sometimes improves these known results in both population limit and finite-sample rate. Our robust estimators are based on minimum distance (MD) functionals (Donoho and Liu, 1988), which project onto a set of distributions under a discrepancy related to the perturbation. We present two approaches for designing MD estimators with good finite-sample rates: weakening the discrepancy and expanding the set of distributions. We also present connections to Gao et al. (2019)'s recent analysis of generative adversarial networks for robust estimation.
翻译:强力统计传统上侧重于外部线,或总变差距离的扰动。然而,数据集可能在许多其它方面被腐蚀,例如系统测量错误和缺失的共差。我们推广了强健的统计方法,以考虑任何瓦瑟斯坦距离下的扰动,并表明只要在友好扰动的大家庭下,分配人口统计是稳健的,就有可能进行稳健的估计。这概括了一种财产,以前在与外部线员进行平均估计的特殊情况下使用的称为弹性。我们通过显示其处于时或超变异条件下,证明普遍复原力属性是合理的。即使是在整体变异的情况下,这些数据集也包含文献中关于平均估计、回归和共差估计的条件;由此产生的分析简化了人口限值和定额抽样率方面的已知结果,有时也改进了这些结果。我们稳健的估算基于最小距离功能(Dano和Liu,1988年),该功能在与扰动相关的特殊情况下被预测成一套分布图。我们提出了两种方法,即设计MDSestimateal-alistial the requidial requialalalational-rmalations lagal lapsal lapsal laps destal laps.(20个) 以及不断变压低的当前机率分析。