We consider the fundamental problem of sampling the optimal transport coupling between given source and target distributions. In certain cases, the optimal transport plan takes the form of a one-to-one mapping from the source support to the target support, but learning or even approximating such a map is computationally challenging for large and high-dimensional datasets due to the high cost of linear programming routines and an intrinsic curse of dimensionality. We study instead the Sinkhorn problem, a regularized form of optimal transport whose solutions are couplings between the source and the target distribution. We introduce a novel framework for learning the Sinkhorn coupling between two distributions in the form of a score-based generative model. Conditioned on source data, our procedure iterates Langevin Dynamics to sample target data according to the regularized optimal coupling. Key to this approach is a neural network parametrization of the Sinkhorn problem, and we prove convergence of gradient descent with respect to network parameters in this formulation. We demonstrate its empirical success on a variety of large scale optimal transport tasks.
翻译:我们考虑了对特定源与目标分布之间最佳运输连接进行抽样调查的根本问题。在某些情况下,最佳运输计划采取从源支持到目标支持的一对一地图的形式,但学习甚至接近这样的地图对大型和高维数据集具有计算上的挑战性,因为线性编程常规费用高,而且具有内在的维度诅咒。我们研究的是Sinkhorn问题,这是一种正规化的最佳运输形式,其解决办法在源与目标分布之间交织在一起。我们引入了一个新颖的框架,以学习以分数为基础的基因模型形式两种分布之间的辛克霍恩组合。根据源数据,我们的程序使Langevin Directs根据常规的最佳组合对目标数据进行抽样。这种方法的关键是Sinkhorn问题的神经网络反光化,我们证明,在这种配方的网络参数方面,梯度下降与梯度参数是趋同的。我们展示了它在各种大规模最佳运输任务上的经验成功。