Estimating optimal transport (OT) maps (a.k.a. Monge maps) between two measures $P$ and $Q$ is a problem fraught with computational and statistical challenges. A promising approach lies in using the dual potential functions obtained when solving an entropy-regularized OT problem between samples $P_n$ and $Q_n$, which can be used to recover an approximately optimal map. The negentropy penalization in that scheme introduces, however, an estimation bias that grows with the regularization strength. A well-known remedy to debias such estimates, which has gained wide popularity among practitioners of regularized OT, is to center them, by subtracting auxiliary problems involving $P_n$ and itself, as well as $Q_n$ and itself. We do prove that, under favorable conditions on $P$ and $Q$, debiasing can yield better approximations to the Monge map. However, and perhaps surprisingly, we present a few cases in which debiasing is provably detrimental in a statistical sense, notably when the regularization strength is large or the number of samples is small. These claims are validated experimentally on synthetic and real datasets, and should reopen the debate on whether debiasing is needed when using entropic optimal transport.
翻译:一种很有希望的方法是,利用在解决样本(P_美元和美元美元)之间典型的OT问题时获得的双重潜在功能,这可用于恢复大致最佳的地图。然而,这一办法中的不高压惩罚带来了一种随着正规化力量而增长的估算偏差。一种众所周知的减少这种估计的补救办法,这种估计在正规化的OT从业人员中广为人知,即通过减少涉及美元和美元本身的辅助问题,以及美元和美元本身。我们确实证明,在有利于美元和美元的条件下,降低偏差可以更接近Monge地图。然而,也许令人惊讶的是,我们提出的少数例子表明,从统计意义上讲,贬低性是有害的,特别是当正规化的强度很大或样品数量很小时,这种估计是集中起来的,而减少涉及美元和本身以及美元本身的辅助问题。我们确实证明,在有利于美元和美元的条件下,降低偏差可以产生更接近Monge地图的效果。然而,或许令人惊讶的是,我们提出的少数例子表明,从统计意义上说,贬低性是有害的,特别是当正规化的强度很大或样品数量很小的时候,这些是重新研究的合成数据是需要的。