Collecting and aggregating information from several probability measures or histograms is a fundamental task in machine learning. One of the popular solution methods for this task is to compute the barycenter of the probability measures under the Wasserstein metric. However, approximating the Wasserstein barycenter is numerically challenging because of the curse of dimensionality. This paper proposes the projection robust Wasserstein barycenter (PRWB) that has the potential to mitigate the curse of dimensionality. Since PRWB is numerically very challenging to solve, we further propose a relaxed PRWB (RPRWB) model, which is more tractable. The RPRWB projects the probability measures onto a lower-dimensional subspace that maximizes the Wasserstein barycenter objective. The resulting problem is a max-min problem over the Stiefel manifold. By combining the iterative Bregman projection algorithm and Riemannian optimization, we propose two new algorithms for computing the RPRWB. The complexity of arithmetic operations of the proposed algorithms for obtaining an $\epsilon$-stationary solution is analyzed. We incorporate the RPRWB into a discrete distribution clustering algorithm, and the numerical results on real text datasets confirm that our RPRWB model helps improve the clustering performance significantly.
翻译:收集并汇总来自若干概率措施或直方图的信息是机器学习的一项根本任务。 最受欢迎的方法之一是根据瓦西斯坦标准计算瓦西斯坦标准下概率措施的中位值。 但是,由于维度的诅咒,几乎瓦西斯坦中位值在数字上具有挑战性。 本文建议预测强大的瓦西斯坦中标(Wasserstein中标)具有减轻维度诅咒的潜力。 由于PRWB在数字上极具挑战性,我们进一步提议一个较易移动的放松的PRWB(RPRWB)模式。 RPRB将概率措施投放到一个低维度的子空间上,使瓦西斯坦中位值目标最大化。 由此产生的问题是Stefel的极限问题。 通过将反复的Bregman预测算法和Riemannian最优化结合起来,我们提出了两种新的算法。 我们进一步分析了为获得$\epsilon-statal溶液而提议的算法的复杂性。 RPRWB将概率投算法投算法投影到一个真正的RPRM 。 我们大幅地确认了我们的IMF IMF IMBSBSBA 。