Computing an optimal transport (OT) coupling between distributions plays an increasingly important role in machine learning. While OT problems can be solved as linear programs, adding an entropic smoothing term is known to result in solvers that are faster and more robust to outliers, differentiable and easier to parallelize. The Sinkhorn fixed point algorithm is the cornerstone of these approaches, and, as a result, multiple attempts have been made to shorten its runtime using, for instance, annealing, momentum or acceleration. The premise of this paper is that \textit{initialization} of the Sinkhorn algorithm has received comparatively little attention, possibly due to two preconceptions: as the regularized OT problem is convex, it may not be worth crafting a tailored initialization as \textit{any} is guaranteed to work; secondly, because the Sinkhorn algorithm is often differentiated in end-to-end pipelines, data-dependent initializations could potentially bias gradient estimates obtained by unrolling iterations. We challenge this conventional wisdom and show that carefully chosen initializations can result in dramatic speed-ups, and will not bias gradients which are computed with implicit differentiation. We detail how initializations can be recovered from closed-form or approximate OT solutions, using known results in the 1D or Gaussian settings. We show empirically that these initializations can be used off-the-shelf, with little to no tuning, and result in consistent speed-ups for a variety of OT problems.
翻译:分配之间的最佳运算( OT) 混合在分布器之间,这在机器学习中起着越来越重要的作用。 虽然 OT 问题可以作为线性程序解决, 但增加一个通缩术语可以导致解决办法的解决者更快、更强大, 对外部线来说, 不同和更容易平行。 Sinkhorn 固定点算法是这些方法的基石, 结果, 多次尝试使用诸如 annealing、 动力或加速等方法缩短运行时间。 本文的前提是 Sinkhorn 算法的Textit{ 初始化} 得到的注意相对较少, 可能是因为两个预设概念: 由于常规化的OT问题是 convex, 可能不值得设计一个定制化的初始化程序, 因为它作为\ textitleit{ any; 其次, Sinkhorn 算法常常在端到端管道上有所区别, 数据依赖初始化或加速化 。 我们质疑这种常规智慧, 并显示精心选择的初始化方法可以导致快速化, 快速化的初始化过程不会产生我们所知道的缩缩化结果, 。