We introduce the weak barycenter of a family of probability distributions, based on the recently developed notion of optimal weak transport of measures arXiv:1412.7480(v4). We provide a theoretical analysis of the weak barycenter and its relationship to the classic Wasserstein barycenter, and discuss its meaning in the light of convex ordering between probability measures. In particular, we argue that, rather than averaging the information of the input distributions as done by the usual optimal transport barycenters, weak barycenters contain geometric information shared across all input distributions, which can be interpreted as a latent random variable affecting all the measures. We also provide iterative algorithms to compute a weak barycenter for either finite or infinite families of arbitrary measures (with finite moments of order 2), which are particularly well suited for the streaming setting, i.e., when measures arrive sequentially. In particular, our streaming computation of weak barycenters does not require to smooth empirical measures or to define a common grid for them, as some of the previous approaches to Wasserstin barycenters do. The concept of weak barycenter and our computation approaches are illustrated on synthetic examples, validated on 2D real-world data and compared to the classical Wasserstein barycenters.
翻译:我们引入了一个概率分布大家庭的薄弱中枢,其依据是最近形成的关于ArXiv:1412.77480(v4)措施的最佳弱化运输概念。 我们对脆弱的中枢及其与经典瓦塞斯坦大中枢的关系进行理论分析,并根据概率测量之间的曲线顺序来讨论其含义。特别是,我们争辩说,与其像通常的最佳运输中枢那样将输入分布的信息平均化,弱中枢包含在所有投入分布中共享的几何信息,这可以解释为影响所有措施的潜在随机变量。我们还提供迭代算法,为有限的或无限的任意措施(有有限的顺序时间)组合计算一个弱中枢,这特别适合流环境,也就是说,当措施依次到达时。特别是,我们对弱中枢的流计算并不要求平滑经验性措施或确定它们的共同网格,正如以前对Wasserstein Barycenters的一些方法那样,可以解释为影响所有措施的潜在的随机变量。我们还提供迭代算法,用以计算一个弱中枢中心,用于有限的或无限的任意措施(有一定的顺序 2号),这特别适合于流环境环境,即当措施到达时,我们对现实的合成中枢数据库进行对比。