Considering two random variables with different laws to which we only have access through finite size iid samples, we address how to reweight the first sample so that its empirical distribution converges towards the true law of the second sample as the size of both samples goes to infinity. We study an optimal reweighting that minimizes the Wasserstein distance between the empirical measures of the two samples, and leads to an expression of the weights in terms of Nearest Neighbors. The consistency and some asymptotic convergence rates in terms of expected Wasserstein distance are derived, and do not need the assumption of absolute continuity of one random variable with respect to the other. These results have some application in Uncertainty Quantification for decoupled estimation and in the bound of the generalization error for the Nearest Neighbor Regression under covariate shift.
翻译:考虑到我们只能通过有限大小的基底样本获得不同法律的两种随机变量,我们讨论了如何对第一个样本进行重新加权,从而使其经验分布随着两个样本的大小达到无限程度而与第二个样本的真正法则趋同。我们研究了一种最佳的再加权方法,该方法将两个样本实验测量结果之间的瓦塞斯坦距离降至最低,并导致以近邻为单位的重量表达。得出了瓦塞斯坦预期距离的一致性和一些无症状的趋同率,而不需要假设一个随机变量相对于另一个变量的绝对连续性。这些结果对分解估计的不确定性量化和在近邻邻国回流变换中一般化错误的界限中有一些应用。