Optimal transport (OT) based data analysis is often faced with the issue that the underlying cost function is (partially) unknown. This paper is concerned with the derivation of distributional limits for the empirical OT value when the cost function and the measures are estimated from data. For statistical inference purposes, but also from the viewpoint of a stability analysis, understanding the fluctuation of such quantities is paramount. Our results find direct application in the problem of goodness-of-fit testing for group families, in machine learning applications where invariant transport costs arise, in the problem of estimating the distance between mixtures of distributions, and for the analysis of empirical sliced OT quantities. The established distributional limits assume either weak convergence of the cost process in uniform norm or that the cost is determined by an optimization problem of the OT value over a fixed parameter space. For the first setting we rely on careful lower and upper bounds for the OT value in terms of the measures and the cost in conjunction with a Skorokhod representation. The second setting is based on a functional delta method for the OT value process over the parameter space. The proof techniques might be of independent interest.
翻译:最佳运输(OT)数据分析往往面临一个问题,即基本成本功能(部分)未知。本文件涉及在根据数据估算成本函数和计量方法时,对经验性OT值分配限制的推算。为了统计推论,但也从稳定性分析的角度来看,了解这种数量的波动至关重要。我们的结果发现,在下述问题上直接适用:对群体家庭进行适当测试的问题、在不易发生运输成本的机器学习应用中、估计分配混合物之间的距离问题、分析经验性切片OT数量的问题。既定分配限制假设,或者统一规范中成本过程的趋同不力,或者费用是由固定参数空间的OT价值优化问题决定的。对于第一种设定,我们依靠谨慎的下限和上限,以测量尺度的OT价值和与Skorokhod代表有关的成本。第二种设定是基于对参数空间的OT价值过程的一种功能三角方法。证据技术可能具有独立的兴趣。