The Wasserstein barycenter is a geometric construct which captures the notion of centrality among probability distributions, and which has found many applications in machine learning. However, most algorithms for finding even an approximate barycenter suffer an exponential dependence on the dimension $d$ of the underlying space of the distributions. In order to cope with this "curse of dimensionality," we study dimensionality reduction techniques for the Wasserstein barycenter problem. When the barycenter is restricted to support of size $n$, we show that randomized dimensionality reduction can be used to map the problem to a space of dimension $O(\log n)$ independent of both $d$ and $k$, and that \emph{any} solution found in the reduced dimension will have its cost preserved up to arbitrary small error in the original space. We provide matching upper and lower bounds on the size of the reduced dimension, showing that our methods are optimal up to constant factors. We also provide a coreset construction for the Wasserstein barycenter problem that significantly decreases the number of input distributions. The coresets can be used in conjunction with random projections and thus further improve computation time. Lastly, our experimental results validate the speedup provided by dimensionality reduction while maintaining solution quality.
翻译:瓦塞斯坦中枢是一个几何构造,它捕捉了概率分布的中心概念,并在机器学习中发现了许多应用。然而,即使找到一个大致的中枢,大多数算法都对分布基础空间的维度有指数依赖。为了应对瓦塞斯坦中枢问题的“维度诅咒”,我们研究瓦塞斯坦中枢的维度减少技术。当中枢仅限于支持大小为$美元时,我们显示随机化的维度减少可以用来将问题映射到一个维度空间$O(log n$),独立于$$和$k$,而在较小中找到的\emph{any}解决方案将会将其成本保留到原始空间的任意小错误中。我们为瓦塞斯坦中枢的尺寸提供了匹配的上下限限制。当中枢仅支持大小为$n,我们也为瓦塞斯坦中枢问题提供了核心设置的构造,大大降低投入分布数量$(log n$ n$) 和$k$($ $) 和$k$($) $($ $) $) 。因此, 和 empph{ab} 解决方案将保存成本的计算结果, 以进一步的计算结果。