In this paper, we study statistical inference for the Wasserstein distance, which has attracted much attention and has been applied to various machine learning tasks. Several studies have been proposed in the literature, but almost all of them are based on asymptotic approximation and do not have finite-sample validity. In this study, we propose an exact (non-asymptotic) inference method for the Wasserstein distance inspired by the concept of conditional Selective Inference (SI). To our knowledge, this is the first method that can provide a valid confidence interval (CI) for the Wasserstein distance with finite-sample coverage guarantee, which can be applied not only to one-dimensional problems but also to multi-dimensional problems. We evaluate the performance of the proposed method on both synthetic and real-world datasets.
翻译:在本文中,我们研究了瓦塞尔斯坦距离的统计推论,这一推论引起了人们的极大关注,并已应用于各种机器学习任务。文献中提出了几项研究,但几乎所有研究都以无症状近似为基础,没有一定的抽样效力。在本研究报告中,我们提出了基于有条件选择性推断概念的瓦塞尔斯坦距离(SI)精确(非无症状)推论方法。据我们所知,这是为瓦塞尔斯坦距离提供有效的信任间隔(CI)的第一个方法,具有有限抽样覆盖率保证,不仅可以适用于一维问题,还可以适用于多维问题。我们评估了拟议方法在合成和现实世界数据集方面的绩效。