Comparing probability distributions is an indispensable and ubiquitous task in machine learning and statistics. The most common way to compare a pair of Borel probability measures is to compute a metric between them, and by far the most widely used notions of metric are the Wasserstein metric and the total variation metric. The next most common way is to compute a divergence between them, and in this case almost every known divergences such as those of Kullback--Leibler, Jensen--Shannon, R\'enyi, and many more, are special cases of the $f$-divergence. Nevertheless these metrics and divergences may only be computed, in fact, are only defined, when the pair of probability measures are on spaces of the same dimension. How would one quantify, say, a KL-divergence between the uniform distribution on the interval $[-1,1]$ and a Gaussian distribution on $\mathbb{R}^3$? We show that these common notions of metrics and divergences give rise to natural distances between Borel probability measures defined on spaces of different dimensions, e.g., one on $\mathbb{R}^m$ and another on $\mathbb{R}^n$ where $m, n$ are distinct, so as to give a meaningful answer to the previous question.
翻译:在机器学习和统计中,比较概率分布是一项不可或缺且无处不在的任务。比较一对博雷尔概率度量的最常见方法是在它们之间计算一个量度,而迄今为止最常用的量度概念是瓦西斯坦度量度和总变异度度度。接下来最常用的方法是计算它们之间的差异,而在此情况下,几乎所有已知的差异,例如库尔贝克-利贝尔、詹森-沙农、赖伊和许多其他的差异,都是美元波动的特殊案例。然而,这些度量度和差异度量度只能计算出来,而事实上,只有当对概率度度量度的对等在同一维度上时,才能确定这些度度度度度度和差异。如何量化(比如) KL), 以美元/ 1, 美元和 美元/ 美元/ 美元/ 美元/ 和 美元/ 美元/ 美元/ 等的分布?我们显示,这些通用的度量度和差异度度度度度度度度度度值是美元/ 美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/美元/