Regularized optimal transport (OT) is now increasingly used as a loss or as a matching layer in neural networks. Entropy-regularized OT can be computed using the Sinkhorn algorithm but it leads to fully-dense transportation plans, meaning that all sources are (fractionally) matched with all targets. To address this issue, several works have investigated quadratic regularization instead. This regularization preserves sparsity and leads to unconstrained and smooth (semi) dual objectives, that can be solved with off-the-shelf gradient methods. Unfortunately, quadratic regularization does not give direct control over the cardinality (number of nonzeros) of the transportation plan. We propose in this paper a new approach for OT with explicit cardinality constraints on the transportation plan. Our work is motivated by an application to sparse mixture of experts, where OT can be used to match input tokens such as image patches with expert models such as neural networks. Cardinality constraints ensure that at most $k$ tokens are matched with an expert, which is crucial for computational performance reasons. Despite the nonconvexity of cardinality constraints, we show that the corresponding (semi) dual problems are tractable and can be solved with first-order gradient methods. Our method can be thought as a middle ground between unregularized OT (recovered in the limit case $k=1$) and quadratically-regularized OT (recovered when $k$ is large enough). The smoothness of the objectives increases as $k$ increases, giving rise to a trade-off between convergence speed and sparsity of the optimal plan.
翻译:常规化的最佳运输(OT)现在越来越多地被用作神经网络中的损耗或匹配层。 不幸的是,二次正规化的OT不能直接控制运输计划的基数(非零数数量),但是它导致完全严格的运输计划,这意味着所有来源(不折不扣地)都与所有目标相匹配。为了解决这一问题,一些工程调查了二次规范化。这种正规化使图像与神经网络等专家模型相匹配,导致无限制和平滑的双重目标,可以通过现成的梯度方法加以解决。不幸的是,二次正规化的正规化并不能直接控制运输计划的基数(非零数)的基数(非零数 ) 。我们在本文件中提出了一个新的方法,使运输计划具有明确的基数限制。我们的工作受到专家稀疏混的混合应用的驱动,OT可以用来将图像补齐,例如与神经网络等专家模型的图像补接。 红度限制确保大多数美元标值与专家匹配,而这对计算绩效来说是至关重要的。尽管基本值的基数(非coxn 数) 的基质性限制增加了目标。我们平级平级平级规则之间的方法可以对应。我们平级平级方法可以用来解决。