Distribution comparison plays a central role in many machine learning tasks like data classification and generative modeling. In this study, we propose a novel metric, called Hilbert curve projection (HCP) distance, to measure the distance between two probability distributions with high robustness and low complexity. In particular, we first project two high-dimensional probability densities using Hilbert curve to obtain a coupling between them, and then calculate the transport distance between these two densities in the original space, according to the coupling. We show that HCP distance is a proper metric and is well-defined for absolutely continuous probability measures. Furthermore, we demonstrate that the empirical HCP distance converges to its population counterpart at a rate of no more than $O(n^{-1/2d})$ under regularity conditions. To suppress the curse-of-dimensionality, we also develop two variants of the HCP distance using (learnable) subspace projections. Experiments on both synthetic and real-world data show that our HCP distance works as an effective surrogate of the Wasserstein distance with low complexity and overcomes the drawbacks of the sliced Wasserstein distance.
翻译:分布比较在许多机器学习任务中起着核心作用, 比如数据分类和基因模型。 在这项研究中, 我们提出一个新的指标, 叫做 Hilbert 曲线投影( HCP) 距离, 以测量高强度和低复杂性两种概率分布之间的距离。 特别是, 我们首先用 Hilbert 曲线预测两个高维概率密度, 以获得它们之间的连接, 然后根据组合计算这两个密度之间在原始空间中的迁移距离。 我们显示 HCP 距离是一个适当的度量, 并且对于绝对连续概率测量来说是定义明确的。 此外, 我们证明实验性 HCP 距离在正常条件下以不超过 $( ⁇ -1/2d}) 的速率与人口对口相交汇。 为了抑制天性诅咒, 我们还开发了两个HCP 距离的变体, 使用( 可忽略的) 子空间预测。 对合成和真实世界数据的实验显示, 我们的HCP 距离是有效替代瓦斯特斯坦 距离的低复杂性并克服了切片瓦列斯特列斯特列斯特列斯特列斯的后方位。