Wasserstein gradient flow has emerged as a promising approach to solve optimization problems over the space of probability distributions. A recent trend is to use the well-known JKO scheme in combination with input convex neural networks to numerically implement the proximal step. The most challenging step, in this setup, is to evaluate functions involving density explicitly, such as entropy, in terms of samples. This paper builds on the recent works with a slight but crucial difference: we propose to utilize a variational formulation of the objective function formulated as maximization over a parametric class of functions. Theoretically, the proposed variational formulation allows the construction of gradient flows directly for empirical distributions with a well-defined and meaningful objective function. Computationally, this approach replaces the computationally expensive step in existing methods, to handle objective functions involving density, with inner loop updates that only require a small batch of samples and scale well with the dimension. The performance and scalability of the proposed method are illustrated with the aid of several numerical experiments involving high-dimensional synthetic and real datasets.
翻译:瓦塞斯特因梯度流已成为解决概率分布空间优化问题的有希望的办法。最近的趋势是,利用众所周知的JKO计划,结合输入锥体神经网络,从数字上实施准度步骤。在这一设置中,最具挑战性的步骤是,从样本的角度,对密度(如英特罗比)的功能进行明确评估。本文件以最近的工作为基础,略有但又重要的差异:我们提议采用一个变式的表述方式,对目标功能进行设计,将目标功能设定为对等分等功能的最大化。理论上,拟议的变式配方可以直接为经验分布而构建梯度流,并具有明确界定和有意义的客观功能。计算,这种方法取代了现有方法中计算成本高昂的步骤,处理密度方面的目标功能,以内部循环更新只需要少量的样品和尺寸,与尺寸相当。拟议方法的性能和伸缩性,通过若干涉及高度合成和真实数据集的数字实验加以说明。