The Wasserstein distance provides a notion of dissimilarities between probability measures, which has recent applications in learning of structured data with varying size such as images and text documents. In this work, we analyze the $k$-nearest neighbor classifier ($k$-NN) under the Wasserstein distance and establish the universal consistency on families of distributions. Using previous known results on the consistency of the $k$-NN classifier on infinite dimensional metric spaces, it suffices to show that the families is a countable union of finite dimension sets. As a result, we show that the $k$-NN classifier is universally consistent on spaces of finitely supported measures, the space of Gaussian measures, and the space of measures with finite wavelet densities. In addition, we give a counterexample to show that the universal consistency does not hold on $\mathcal{W}_p((0,1))$.
翻译:瓦色尔斯坦距离提供了一种概率度量差异的概念,它最近应用于学习图像和文本文件等大小不一的结构化数据。 在这项工作中,我们分析了瓦色尔斯坦距离下最近的邻居分类器(k$-NN),并建立了分布式家庭的普遍一致性。利用以前已知的关于千元-NNE分类器在无限维度空间上的一致性的结果,足以表明这些家庭是有限维度组的可计数组合。结果,我们显示,美元-NNE分类器在有限支持措施的空间、高斯测量空间和有限波密度措施的空间上是普遍一致的。此外,我们用一个反实例来表明,普遍性一致性并不维持在$mathcal{W ⁇ p( 0,1)美元上。