Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions. For instance, OT is a popular loss function that quantifies the discrepancy between an empirical distribution and a parametric model. Recently, an entropic penalty term and the celebrated Sinkhorn algorithm have been commonly used to approximate the original OT in a computationally efficient way. However, since the Sinkhorn algorithm runs a projection associated with the Kullback-Leibler divergence, it is often vulnerable to outliers. To overcome this problem, we propose regularizing OT with the \beta-potential term associated with the so-called $\beta$-divergence, which was developed in robust statistics. Our theoretical analysis reveals that the $\beta$-potential can prevent the mass from being transported to outliers. We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers. In addition, our proposed method can successfully detect outliers from a contaminated dataset
翻译:最佳运输( OT) 已经成为机器学习领域广泛使用的工具, 以测量概率分布之间的差异。 例如, OT 是一个流行的损失函数, 它量化了经验分布和参数模型之间的差异。 最近, 一种典型的惩罚术语和著名的Sinkhorn算法被普遍用来以计算效率高的方式估计原始OT。 然而, 由于 Sinkhorn 算法的预测与 Kullback- Leiber 差异相关, 它往往易受外部线的影响。 为了克服这个问题, 我们提议将OT 与所谓的$\beta$- diverence 相关\beta- 潜在术语( bata- potency) 正规化, 这个术语是建立在可靠统计数据中的。 我们的理论分析显示, $\ beta$ 潜能可以防止质量被传送到外部线。 我们实验性地证明, 与我们的算算算的运输矩阵有助于估计概率的分布, 即使存在外部线。 此外, 我们提议的方法可以成功地从被污染的数据中探测出外部线。