Issued from Optimal Transport, the Wasserstein distance has gained importance in Machine Learning due to its appealing geometrical properties and the increasing availability of efficient approximations. In this work, we consider the problem of estimating the Wasserstein distance between two probability distributions when observations are polluted by outliers. To that end, we investigate how to leverage Medians of Means (MoM) estimators to robustify the estimation of Wasserstein distance. Exploiting the dual Kantorovitch formulation of Wasserstein distance, we introduce and discuss novel MoM-based robust estimators whose consistency is studied under a data contamination model and for which convergence rates are provided. These MoM estimators enable to make Wasserstein Generative Adversarial Network (WGAN) robust to outliers, as witnessed by an empirical study on two benchmarks CIFAR10 and Fashion MNIST. Eventually, we discuss how to combine MoM with the entropy-regularized approximation of the Wasserstein distance and propose a simple MoM-based re-weighting scheme that could be used in conjunction with the Sinkhorn algorithm.
翻译:在最佳交通中,瓦森斯坦距离因其具有吸引力的几何特性以及效率近似的可用性日益增加,在机器学习中变得日益重要。在这项工作中,我们考虑了在观测被外部线污染时估计瓦森斯坦距离两个概率分布的问题。为此,我们调查如何利用手段中枢(MOM)测量器来巩固瓦森斯坦距离的估计。利用瓦森斯坦距离的Kantorovitch双倍配方,我们介绍并讨论基于MM的新的强力估计器,这些测算器的连贯性是在数据污染模型下研究的,并为之提供了趋同率。这些测算器使瓦森·吉纳多德阿versarial网络(WGAN)能够对外端站强大起来,正如关于CIFAR10和法希恩·马尼特两个基准的实证研究所证明的那样。最后,我们讨论如何将MM与瓦瑟斯坦距离的正统近值结合起来,并提出一个简单的基于MOM的重新加权计划,可以与Sinkhorn算法一起使用。