Minimizing functionals in the space of probability distributions can be done with Wasserstein gradient flows. To solve them numerically, a possible approach is to rely on the Jordan-Kinderlehrer-Otto (JKO) scheme which is analogous to the proximal scheme in Euclidean spaces. However, this bilevel optimization problem is known for its computational challenges, especially in high dimension. To alleviate it, very recent works propose to approximate the JKO scheme leveraging Brenier's theorem, and using gradients of Input Convex Neural Networks to parameterize the density (JKO-ICNN). However, this method comes with a high computational cost and stability issues. Instead, this work proposes to use gradient flows in the space of probability measures endowed with the sliced-Wasserstein (SW) distance. We argue that this method is more flexible than JKO-ICNN, since SW enjoys a closed-form differentiable approximation. Thus, the density at each step can be parameterized by any generative model which alleviates the computational burden and makes it tractable in higher dimensions. Interestingly, we also show empirically that these gradient flows are strongly related to the usual Wasserstein gradient flows, and that they can be used to minimize efficiently diverse machine learning functionals.
翻译:最小化概率分布空间中的功能可以用瓦塞尔斯坦梯度流来最小化概率分布空间的功能。 要从数字上解决这些问题, 一种可能的办法是依赖约旦- Kinderleherder- Ottto (JKO) 方案, 这个方案类似于欧clidean 空间的近似方案 。 然而, 这个双级优化问题因其计算挑战而为人所知, 特别是在高维度上。 为了缓解这一问题, 最近的工作建议利用 Brenier 的理论, 并使用输入 Convex 神经网络的梯度来参数化密度( JKO- ICNNN ) 。 但是, 这种方法的计算成本和稳定性问题都很高。 相反, 这项工作建议使用与切除- Wasserstein (SW) 距离相配的概率测量空间中的梯度的梯度流。 我们争辩说, 这个方法比JKO- ICNNNN( ICNN) 更灵活, 因为 SW 拥有一种封闭式不同的近度。 因此, 每一步的密度都可以用任何基因化模型来比较化模型来比较准,, 减轻计算负担, 并使它在更高的层次中可以拉动。