Optimal transport (OT) is a versatile framework for comparing probability measures, with many applications to statistics, machine learning, and applied mathematics. However, OT distances suffer from computational and statistical scalability issues to high dimensions, which motivated the study of regularized OT methods like slicing, smoothing, and entropic penalty. This work establishes a unified framework for deriving limit distributions of empirical regularized OT distances, semiparametric efficiency of the plug-in empirical estimator, and bootstrap consistency. We apply the unified framework to provide a comprehensive statistical treatment of: (i) average- and max-sliced $p$-Wasserstein distances, for which several gaps in existing literature are closed; (ii) smooth distances with compactly supported kernels, the analysis of which is motivated by computational considerations; and (iii) entropic OT, for which our method generalizes existing limit distribution results and establishes, for the first time, efficiency and bootstrap consistency. While our focus is on these three regularized OT distances as applications, the flexibility of the proposed framework renders it applicable to broad classes of functionals beyond these examples.
翻译:最佳运输(OT)是比较概率计量的多功能框架,有许多应用用于统计、机器学习和应用数学。然而,OT距离在计算和统计可扩缩性问题上影响高维度,这促使研究常规化的OT方法,如切片、平滑和摄取罚款。这项工作为得出实验性常规化的OT距离的有限分布、插头实证估计器的半对称效率以及靴套的一致性建立了一个统一框架。我们采用统一框架,以提供全面的统计处理:(一) 平均和最高许可的美元-Wasserstein距离,因为现有文献中的一些差距已经消除;(二) 与紧凑支持的内核的平稳距离,分析是出于计算考虑;以及(三) 预测性OT,我们的方法概括了现有限制分布结果,并首次确定了效率和靴套系一致性。虽然我们的重点是这三个常规化的OT距离作为应用,但拟议框架的灵活性使其适用于这些例子以外的一系列功能。