The ability to compare and align related datasets living in heterogeneous spaces plays an increasingly important role in machine learning. The Gromov-Wasserstein (GW) formalism can help tackle this problem. Its main goal is to seek an assignment (more generally a coupling matrix) that can register points across otherwise incomparable datasets. As a non-convex and quadratic generalization of optimal transport (OT), GW is NP-hard. Yet, heuristics are known to work reasonably well in practice, the state of the art approach being to solve a sequence of nested regularized OT problems. While popular, that heuristic remains too costly to scale, with cubic complexity in the number of samples $n$. We show in this paper how a recent variant of the Sinkhorn algorithm can substantially speed up the resolution of GW. That variant restricts the set of admissible couplings to those admitting a low rank factorization as the product of two sub-couplings. By updating alternatively each sub-coupling, our algorithm computes a stationary point of the problem in quadratic time with respect to the number of samples. When cost matrices have themselves low rank, our algorithm has time complexity $\mathcal{O}(n)$. We demonstrate the efficiency of our method on simulated and real data.
翻译:比较和协调不同空间相关数据集的能力在机器学习中发挥着越来越重要的作用。 Gromov-Wasserstein (GW) 形式主义(GW) 有助于解决这一问题。 其主要目的是寻找一个能够通过其他无法比较的数据集来登记点的指派( 更一般而言是一个混合矩阵) 。 作为最佳运输( OT) 的非隐形和四面形的概括化, GW 是硬的。 然而, 惯性理论在实际操作中作用相当良好, 最新的方法是解决一系列固定的OT问题。 虽然流行, 超常方法仍然成本过高, 样本数量为3美元。 我们在本文中展示了Sinkhorn 算法的最新变异能如何大大加快 GW 的解析。 该变异将允许的联动设置限制在以低等级因子化为两个子相交错结果的组合中。 通过更新每个子相联, 我们的算法将固定的固定点压缩成美元。 当我们测算时, 我们的测算法本身具有低等级的模型。