We provide a general framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems, which aim at estimating unknown parameter of the true distribution given adversarially corrupted samples. Prior work focus on the problem of robust mean and covariance estimation when the true distribution lies in the family of Gaussian distributions or elliptical distributions, and analyze depth or scoring rule based GAN losses for the problem. Our work extend these to robust mean estimation, second moment estimation, and robust linear regression when the true distribution only has bounded Orlicz norms, which includes the broad family of sub-Gaussian, sub-Exponential and bounded moment distributions. We also provide a different set of sufficient conditions for the GAN loss to work: we only require its induced distance function to be a cumulative density function of some light-tailed distribution, which is easily satisfied by neural networks with sigmoid activation. In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance, which overcomes the computational intractability of the original Kolmogorov-Smirnov distance used in the prior work.
翻译:我们提供了一个总体框架,用于设计基因反向网络(GANs),以解决高维的可靠统计问题,目的是估计在对抗性腐蚀样品的情况下真实分布的未知参数; 先前的工作重点是在Gaussian分布或椭圆分布的家族中真实分布时的稳健平均和共变估计问题; 分析基于基于GAN的深度或评分规则的这一问题损失; 我们的工作范围扩大到强势平均估计、 第二次估计和强势的线性回归,而真正的分布仅与Orlicz规范相联,包括亚伽西、亚消耗性和约束性瞬间分布的宽大家庭; 我们还为GAN损失的工作提供了一套不同的充分条件:我们只需要它的诱发距离功能成为某些轻尾分布的累积密度函数,而神经网络很容易用模拟激活来满足这一点。 在技术方面,我们提议的GAN损失可视为一种平滑和普遍化的Kolmogorov-Smirnov距离,从而克服了最初使用的Kol-stal-stal-traftable的距离。