The Gromov-Wasserstein (GW) distance quantifies dissimilarity between metric measure spaces and provides a meaningful figure of merit for applications involving heterogeneous data. While computational aspects of the GW distance have been widely studied, a strong duality theory and fundamental statistical questions concerning empirical convergence rates remained obscure. This work closes these gaps for the $(2,2)$-GW distance (namely, with quadratic cost) over Euclidean spaces of different dimensions $d_x$ and $d_y$. We consider both the standard GW and the entropic GW (EGW) distances, derive their dual forms, and use them to analyze expected empirical convergence rates. The resulting rates are $n^{-2/\max\{d_x,d_y,4\}}$ (up to a log factor when $\max\{d_x,d_y\}=4$) and $n^{-1/2}$ for the two-sample GW and EGW problems, respectively, which matches the corresponding rates for standard and entropic optimal transport distances. We also study stability of EGW in the entropic regularization parameter and establish approximation and continuity results for the cost and optimal couplings. Lastly, the duality is leveraged to shed new light on the open problem of the one-dimensional GW distance between uniform distributions on $n$ points, illuminating why the identity and anti-identity permutations may not be optimal. Our results serve as a first step towards a comprehensive statistical theory as well as computational advancements for GW distances, based on the discovered dual formulation.
翻译:Gromov-Wasserstein (GW) 距离量化了计量空间之间的差异,为涉及不同数据的应用程序提供了有意义的优点数字。虽然对GW距离的计算方面进行了广泛研究,但关于经验趋同率的强烈双重理论和基本统计问题仍然模糊不清。 这项工作填补了Euclidean不同维度的距离(即四维成本)与Euclidean不同维度的距离(即四维成本)之间的这些差距。 我们认为标准GW 和 entropic GW 距离(EGW) 的标准基数和值的优异差数值,得出了它们的双重格式,并用它们来分析预期的经验趋同率。 由此得出的汇率是 $2/\\max ⁇ d_xx, d_y,y, 4 美元(在美元=mexludeal_x, d_y ⁇ 4美元) 和 $n%-1/2美元之间的差距。 我们的GW 和 EGW 问题分别与标准与标准最优和最优化运输距离的距离的对应率率,我们也研究一个最接近和最接近的精确的精确的精确度的精确度的比值的比值的精确度的比值。