We propose and investigate several statistical models and corresponding sampling schemes for data analysis based on unbalanced optimal transport (UOT) between finitely supported measures. Specifically, we analyse Kantorovich-Rubinstein (KR) distances with penalty parameter $C>0$. The main result provides non-asymptotic bounds on the expected error for the empirical KR distance as well as for its barycenters. The impact of the penalty parameter $C$ is studied in detail. Our approach justifies randomised computational schemes for UOT which can be used for fast approximate computations in combination with any exact solver. Using synthetic and real datasets, we empirically analyse the behaviour of the expected errors in simulation studies and illustrate the validity of our theoretical bounds.
翻译:我们提出并调查若干统计模型和相应的抽样计划,以便根据有限支持措施之间不平衡的最佳运输(UOT)进行数据分析。具体地说,我们分析了Kantorovich-Rubinstein(KR)距离和罚款参数$C>0美元。主要结果提供了经验KR距离及其中间点的预期错误的非被动界限。详细研究了惩罚参数($C$)的影响。我们的方法证明对UOT的随机计算计划是合理的,这种计划可以与任何精确的求解器一起用于快速近似计算。我们利用合成和真实的数据集,对模拟研究中预期错误的行为进行了实验性分析,并说明了我们理论界限的有效性。