Semi-discrete optimal transport problems, which evaluate the Wasserstein distance between a discrete and a generic (possibly non-discrete) probability measure, are believed to be computationally hard. Even though such problems are ubiquitous in statistics, machine learning and computer vision, however, this perception has not yet received a theoretical justification. To fill this gap, we prove that computing the Wasserstein distance between a discrete probability measure supported on two points and the Lebesgue measure on the standard hypercube is already #P-hard. This insight prompts us to seek approximate solutions for semi-discrete optimal transport problems. We thus perturb the underlying transportation cost with an additive disturbance governed by an ambiguous probability distribution, and we introduce a distributionally robust dual optimal transport problem whose objective function is smoothed with the most adverse disturbance distributions from within a given ambiguity set. We further show that smoothing the dual objective function is equivalent to regularizing the primal objective function, and we identify several ambiguity sets that give rise to several known and new regularization schemes. As a byproduct, we discover an intimate relation between semi-discrete optimal transport problems and discrete choice models traditionally studied in psychology and economics. To solve the regularized optimal transport problems efficiently, we use a stochastic gradient descent algorithm with imprecise stochastic gradient oracles. A new convergence analysis reveals that this algorithm improves the best known convergence guarantee for semi-discrete optimal transport problems with entropic regularizers.
翻译:半分解的最佳运输问题评估了离散和通用(可能非分解)概率度量之间的瓦瑟斯坦距离。 据认为,这种问题在计算上很难。 尽管这些问题在统计、机器学习和计算机视觉方面是无处不在的, 但是, 这种感觉还没有得到理论上的理由。 为了填补这一差距, 我们证明计算在两个点上支持的离散概率度尺度与标准超立方的莱贝斯格测量值之间的瓦瑟斯坦距离已经很硬。 这种洞察促使我们寻找半分异最佳运输问题的近似解决办法。 我们因此在潜在的运输成本中渗透一种由模糊概率分布决定的添加性扰动, 我们引入了一种分布稳健的双重最佳运输问题, 其目标功能与在给定的模糊性范围内最差的干扰分布相平滑。 我们进一步证明, 平滑两重目标功能相当于对原始目标功能的正规化, 我们发现一些模糊的设置, 使得一些已知的和新的正规的运输计划得以建立起来。 作为产品, 我们发现一个与最接近的不精确的运输成本分析, 我们与一个已知的不固定的不精确的运输模型和最优级分析。