Comparing two probability measures supported on heterogeneous spaces is an increasingly important problem in machine learning. Such problems arise when comparing for instance two populations of biological cells, each described with its own set of features, or when looking at families of word embeddings trained across different corpora/languages. For such settings, the Gromov Wasserstein (GW) distance is often presented as the gold standard. GW is intuitive, as it quantifies whether one measure can be isomorphically mapped to the other. However, its exact computation is intractable, and most algorithms that claim to approximate it remain expensive. Building on \cite{memoli-2011}, who proposed to represent each point in each distribution as the 1D distribution of its distances to all other points, we introduce in this paper the Anchor Energy (AE) and Anchor Wasserstein (AW) distances, which are respectively the energy and Wasserstein distances instantiated on such representations. Our main contribution is to propose a sweep line algorithm to compute AE \emph{exactly} in log-quadratic time, where a naive implementation would be cubic. This is quasi-linear w.r.t. the description of the problem itself. Our second contribution is the proposal of robust variants of AE and AW that uses rank statistics rather than the original distances. We show that AE and AW perform well in various experimental settings at a fraction of the computational cost of popular GW approximations. Code is available at \url{https://github.com/joisino/anchor-energy}.
翻译:比较不同空间所支持的两种概率度量是一个越来越重要的机器学习问题。当比较生物细胞的两组人口时,每个生物细胞都有自己的一套特征,或者当查看经过不同体系/语言培训的字嵌入式的家庭时,就会出现这样的问题。对于这些环境,格罗莫夫·瓦西斯坦(GW)的距离通常以黄金标准的形式出现。GW是不直观的,因为它量化了一种计量是否可以与另一个测量成形。然而,精确的计算是难以操作的,而声称接近它的大多数算法仍然是昂贵的。以每组的生物细胞群群中每个细胞都有其各自的特征,每个细胞嵌入式都具有不同的特性。在对正对面/直径的计算中,以1D的距离分布为代表每个分布。我们在本文中将安歇尔能源(AE)和安歇尔·瓦西斯坦(AW)的距离分别以能量和瓦西斯坦(A)的距离为直径。 我们的主要贡献是提出一个扫描线算法,在逻辑/exactly ral ral ral 设置中, 在逻辑-alalalalalalalalation 上显示一个精确的A-dededeal- dexalation 而不是我们的A- dealation expalation expal is ex expal is the a expal is a expaltialtialtial is a ex ex ex ex ex ex expal is is extial ex expaltialitalitalitalital is.