最优化运输 (Outlier-Robust Optimal Transport)

Optimal transport (OT) measures distances between distributions in a way that depends on the geometry of the sample space. In light of recent advances in computational OT, OT distances are widely used as loss functions in machine learning. Despite their prevalence and advantages, OT loss functions can be extremely sensitive to outliers. In fact, a single adversarially-picked outlier can increase the standard $W_2$-distance arbitrarily. To address this issue, we propose an outlier-robust formulation of OT. Our formulation is convex but challenging to scale at a first glance. Our main contribution is deriving an \emph{equivalent} formulation based on cost truncation that is easy to incorporate into modern algorithms for computational OT. We demonstrate the benefits of our formulation in mean estimation problems under the Huber contamination model in simulations and outlier detection tasks on real data.

翻译：最佳运输(OT) 以取决于抽样空间几何测量的方式测量分布之间的距离。根据最近在计算 OT 方面的进展, OT 距离被广泛用作机器学习中的损失函数。尽管OT 距离具有普遍和优势, 但OT 损失函数对外部线极为敏感。事实上, 单由对立选择的外部线可以任意提高标准 W_ 2美元- 距离。为了解决这个问题, 我们提议了 OT 的外部- 机器人配方。我们的配方是直角的, 但它在第一眼上具有挑战性。我们的主要贡献是产生基于成本脱线的 \ emph{ 等值配方, 这很容易纳入计算 OT 的现代算法中。我们展示了我们的配方在模拟和真实数据的外部探测任务中根据Huber 污染模型估算问题的好处。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知会员服务

66+阅读 · 2020年6月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日