Optimal transport (OT) measures distances between distributions in a way that depends on the geometry of the sample space. In light of recent advances in computational OT, OT distances are widely used as loss functions in machine learning. Despite their prevalence and advantages, OT loss functions can be extremely sensitive to outliers. In fact, a single adversarially-picked outlier can increase the standard $W_2$-distance arbitrarily. To address this issue, we propose an outlier-robust formulation of OT. Our formulation is convex but challenging to scale at a first glance. Our main contribution is deriving an \emph{equivalent} formulation based on cost truncation that is easy to incorporate into modern algorithms for computational OT. We demonstrate the benefits of our formulation in mean estimation problems under the Huber contamination model in simulations and outlier detection tasks on real data.
翻译:最佳运输(OT) 以取决于抽样空间几何测量的方式测量分布之间的距离。 根据最近在计算 OT 方面的进展, OT 距离被广泛用作机器学习中的损失函数。 尽管OT 距离具有普遍和优势, 但OT 损失函数对外部线极为敏感。 事实上, 单由对立选择的外部线可以任意提高标准 W_ 2美元- 距离。 为了解决这个问题, 我们提议了 OT 的外部- 机器人配方。 我们的配方是直角的, 但它在第一眼上具有挑战性。 我们的主要贡献是产生基于成本脱线的 \ emph{ 等值 配方, 这很容易纳入计算 OT 的现代算法中。 我们展示了我们的配方在模拟和真实数据的外部探测任务中根据Huber 污染模型估算问题的好处。