We study the problem of robust distribution estimation under the Wasserstein metric, a popular discrepancy measure between probability distributions rooted in optimal transport (OT) theory. We introduce a new outlier-robust Wasserstein distance $\mathsf{W}_p^\varepsilon$ which allows for $\varepsilon$ outlier mass to be removed from its input distributions, and show that minimum distance estimation under $\mathsf{W}_p^\varepsilon$ achieves minimax optimal robust estimation risk. Our analysis is rooted in several new results for partial OT, including an approximate triangle inequality, which may be of independent interest. To address computational tractability, we derive a dual formulation for $\mathsf{W}_p^\varepsilon$ that adds a simple penalty term to the classic Kantorovich dual objective. As such, $\mathsf{W}_p^\varepsilon$ can be implemented via an elementary modification to standard, duality-based OT solvers. Our results are extended to sliced OT, where distributions are projected onto low-dimensional subspaces, and applications to homogeneity and independence testing are explored. We illustrate the virtues of our framework via applications to generative modeling with contaminated datasets.
翻译:我们根据瓦塞斯坦标准研究了基于最佳运输理论(OT)的概率分布差异的流行测量方法,瓦塞斯坦标准下强力分布估计问题。我们采用了一个新的外值-粗值瓦塞斯坦距离 $\ mathsfsf{W ⁇ p ⁇ varepsilon$,允许从输入分布中去除美元和瓦列普西隆值外值,并表明在美元=mathsf{W ⁇ p ⁇ p ⁇ varepsilon$下的最低距离估计能达到最优化的稳健估计风险。我们的分析根植于部分OT的一些新结果,包括可能具有独立兴趣的近似三角差异。为了处理计算可移动性,我们为$\mathsfsf{W ⁇ p ⁇ varepsilon$制定了一种双重配方,该配方能为典型的康托洛维奇双重目标添加一个简单的惩罚条件。因此,可以通过对标准、基于双重基底值的OT解决方案进行初步修改,将结果推广到切化的OT,其中的分布将显示我们通过低维度测试的基域独立度应用。