Comparing probability distributions is an indispensable and ubiquitous task in machine learning and statistics. The most common way to compare a pair of Borel probability measures is to compute a metric between them, and by far the most widely used notions of metric are the Wasserstein metric and the total variation metric. The next most common way is to compute a divergence between them, and in this case almost every known divergences such as those of Kullback--Leibler, Jensen--Shannon, R\'enyi, and many more, are special cases of the $f$-divergence. Nevertheless these metrics and divergences may only be computed, in fact, are only defined, when the pair of probability measures are on spaces of the same dimension. How would one quantify, say, a KL-divergence between the uniform distribution on the interval $[-1,1]$ and a Gaussian distribution on $\mathbb{R}^3$? We will show that, in a completely natural manner, various common notions of metrics and divergences give rise to a distance between Borel probability measures defined on spaces of different dimensions, e.g., one on $\mathbb{R}^m$ and another on $\mathbb{R}^n$ where $m, n$ are distinct, so as to give a meaningful answer to the previous question.
翻译:在机器学习和统计中,比较概率分布是一项不可或缺且无处不在的任务。比较一对博雷尔概率度量的最常见方法是计算它们之间的一个度量,迄今为止最广泛使用的度量概念是瓦塞斯坦度和总变异度度。接下来最常见的方法是计算它们之间的差异,在此情况下,几乎所有已知的差异,例如库尔贝克-利贝尔、詹森-沙农、赖伊和许多其他的差异,都是美元波动的特殊案例。然而,这些度量度和差异只能计算出来,事实上,当概率度量是在同一维度的对等时,这些度量度和差异才可能被确定为瓦塞斯坦度。 如何量化,比如, KL- 差异分布在 $-1, $ 和 $\ R 3$ 上? 我们将以完全自然的方式显示, 各种通用的度量度和差异概念导致美元 之间的距离, 以美元 美元 度度度度度度 和 美元 度度度 确定 。