The Wasserstein distance provides a notion of dissimilarities between probability measures, which has recent applications in learning of structured data with varying size such as images and text documents. In this work, we study the $k$-nearest neighbor classifier ($k$-NN) of probability measures under the Wasserstein distance. We show that the $k$-NN classifier is not universally consistent on the space of measures supported in $(0,1)$. As any Euclidean ball contains a copy of $(0,1)$, one should not expect to obtain universal consistency without some restriction on the base metric space, or the Wasserstein space itself. To this end, via the notion of $\sigma$-finite metric dimension, we show that the $k$-NN classifier is universally consistent on spaces of measures supported in a $\sigma$-uniformly discrete set. In addition, by studying the geodesic structures of the Wasserstein spaces for $p=1$ and $p=2$, we show that the $k$-NN classifier is universally consistent on the space of measures supported on a finite set, the space of Gaussian measures, and the space of measures with densities expressed as finite wavelet series.
翻译:瓦色尔斯坦距离提供了一种概率度量之间差异的概念,这种概率度量在学习图像和文本文件等不同大小的结构数据方面最近应用了各种应用。 在这项工作中,我们研究了瓦色尔斯坦距离下概率度量的近邻分类器(k$-NN),我们表明,美元-NNE分类器在以(0,1美元)支持的措施空间上并不普遍一致。由于任何欧洲大陆球都包含1美元的复制件,人们不应期望在不限制基准空间或瓦西尔斯坦空间本身的情况下实现普遍一致性。为此,我们通过美元-美元-NEGE分类器的概念,显示美元-NEGER在以美元-统一离散方式支持的措施空间空间空间空间空间空间空间空间空间上普遍一致,以固定空间空间空间空间空间空间空间空间空间空间空间的定数、定数、定数空间空间空间空间空间定数、定数、定数空间空间空间空间空间定数、定数、定数、定数空间空间空间空间空间空间定数表显示的空间空间空间空间空间空间空间空间空间空间。