We introduce LOT Wassmap, a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space. The algorithm is motivated by the observation that many datasets are naturally interpreted as probability measures rather than points in $\mathbb{R}^n$, and that finding low-dimensional descriptions of such datasets requires manifold learning algorithms in the Wasserstein space. Most available algorithms are based on computing the pairwise Wasserstein distance matrix, which can be computationally challenging for large datasets in high dimensions. Our algorithm leverages approximation schemes such as Sinkhorn distances and linearized optimal transport to speed-up computations, and in particular, avoids computing a pairwise distance matrix. We provide guarantees on the embedding quality under such approximations, including when explicit descriptions of the probability measures are not available and one must deal with finite samples instead. Experiments demonstrate that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size. We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
翻译:我们引入了LOT Wassmap, 这是一种计算上可行的算法, 以在瓦塞斯坦空间发现低维度结构。 算法的动机是观察到许多数据集自然地被解释为概率计量,而不是美元中的点值。 找到这类数据集的低维描述需要在瓦塞斯坦空间采用多重学习算法。 大多数可用的算法都基于计算双向瓦西斯坦距离矩阵, 它可以计算对高维度的大型数据集构成挑战。 我们的算法利用了Sinkhorn距离和线性最佳传输以加速计算, 特别是避免计算对称距离矩阵。 我们为在这种近似下嵌入质量提供了保证, 包括当没有明确描述概率计量时, 并且必须用有限的样本代替。 实验证明LOT Wassmap 实现了正确的嵌入, 质量随着样本规模的扩大而提高。 我们还表明, LOT是如何大幅降低计算成本的, 与依赖对称距离计算法的算法相比, 。