Recently used in various machine learning contexts, the Gromov-Wasserstein distance (GW) allows for comparing distributions whose supports do not necessarily lie in the same metric space. However, this Optimal Transport (OT) distance requires solving a complex non convex quadratic program which is most of the time very costly both in time and memory. Contrary to GW, the Wasserstein distance (W) enjoys several properties (e.g. duality) that permit large scale optimization. Among those, the solution of W on the real line, that only requires sorting discrete samples in 1D, allows defining the Sliced Wasserstein (SW) distance. This paper proposes a new divergence based on GW akin to SW. We first derive a closed form for GW when dealing with 1D distributions, based on a new result for the related quadratic assignment problem. We then define a novel OT discrepancy that can deal with large scale distributions via a slicing approach and we show how it relates to the GW distance while being $O(n\log(n))$ to compute. We illustrate the behavior of this so called Sliced Gromov-Wasserstein (SGW) discrepancy in experiments where we demonstrate its ability to tackle similar problems as GW while being several order of magnitudes faster to compute.
翻译:最近在各种机器学习背景下使用的格罗莫夫-瓦瑟斯坦距离(GW)允许比较支持并不一定在相同度空间内的分布。 然而,这种最佳运输(OT)距离需要解决一个复杂的非 convex二次方程式,这个程序在时间和记忆上大部分时间都是非常昂贵的。 与GW相反, 瓦塞尔斯坦距离(W)具有一些允许大规模优化的特性( 如双重性) 。 其中, W在实际线上的解决方案, 只需要在 1D 中分解样本, 就可以定义 Sasserstein (SW) 距离。 本文建议基于 GW 与 SW 的距离进行新的差异。 我们首先在处理 1D 分布时为 GW 设定一种封闭的形式, 与相关的二次二次分配问题的新结果相反。 我们随后定义了一个新的 OT 差异, 可以通过剪切方法处理大规模分布, 并且我们展示它与 GW 距离的关系如何在$O( n) 和 compute (SW) 的距离关系, 也就是以 compute 的距离上, 我们用“ ” Qsermasteard” 动作展示了它的能力, 我们用相同的速度来显示它是如何变异于“ 。