We develop a kernel projected Wasserstein distance for the two-sample test, an essential building block in statistics and machine learning: given two sets of samples, to determine whether they are from the same distribution. This method operates by finding the nonlinear mapping in the data space which maximizes the distance between projected distributions. In contrast to existing works about projected Wasserstein distance, the proposed method circumvents the curse of dimensionality more efficiently. We present practical algorithms for computing this distance function together with the non-asymptotic uncertainty quantification of empirical estimates. Numerical examples validate our theoretical results and demonstrate good performance of the proposed method.
翻译:我们开发出一个预测瓦森斯坦距离的内核,用于两样抽样测试,这是统计和机器学习的基本基石:给两套样本,以确定它们是否来自同一分布。这个方法的运作方法是在数据空间中找到非线性绘图,使预测分布之间的距离最大化。与目前关于预测瓦森斯坦距离的工程相比,拟议方法可以更有效地绕过维度的诅咒。我们提出了计算这一距离函数的实用算法,同时对经验性估计进行非不痛苦的不确定性量化。数字实例证实了我们的理论结果,并展示了拟议方法的良好表现。