Variational inference (VI) provides an appealing alternative to traditional sampling-based approaches for implementing Bayesian inference due to its conceptual simplicity, statistical accuracy and computational scalability. However, common variational approximation schemes, such as the mean-field (MF) approximation, require certain conjugacy structure to facilitate efficient computation, which may add unnecessary restrictions to the viable prior distribution family and impose further constraints on the variational approximation family. In this work, we develop a general computational framework for implementing MF-VI via Wasserstein gradient flow (WGF), a gradient flow over the space of probability measures. When specialized to Bayesian latent variable models, we analyze the algorithmic convergence of an alternating minimization scheme based on a time-discretized WGF for implementing the MF approximation. In particular, the proposed algorithm resembles a distributional version of EM algorithm, consisting of an E-step of updating the latent variable variational distribution and an M-step of conducting steepest descent over the variational distribution of parameters. Our theoretical analysis relies on optimal transport theory and subdifferential calculus in the space of probability measures. We prove the exponential convergence of the time-discretized WGF for minimizing a generic objective functional given strict convexity along generalized geodesics. We also provide a new proof of the exponential contraction of the variational distribution obtained from the MF approximation by using the fixed-point equation of the time-discretized WGF. We apply our method and theory to two classic Bayesian latent variable models, the Gaussian mixture model and the mixture of regression model. Numerical experiments are also conducted to compliment the theoretical findings under these two models.
翻译:在这项工作中,我们制定了一个通用的计算框架,用于实施传统的基于抽样的Bayesa 推断方法,因为其概念简单、统计准确和计算可缩放性,但是,由于概念简单、统计准确和计算性可缩放性,共同的变差近似计划,例如平均场(MF)近似计划,需要某些共性结构来方便高效计算,这可能会给可行的先前分配家庭增加不必要的限制,并对差异近似家庭造成进一步限制。在这项工作中,我们制定了一个总计算框架,用于实施MF-VI,通过Wasserstein 梯度流(WGF)执行MF-VI,这是在概率测量空间上的一种渐渐渐渐变的波动。当专门针对Bayesian 潜伏性变量模型时,我们分析了基于时间分流模型的交替最小化最小化最小化最小化最小化最小化最小化最小值的最小化最低值组合方法的最小化最小化最小化最小化最低值组合组合组合。我们还在使用精确性概率测测测测测算法下,我们根据精确测算的精确测算得出了这些变差值的基值的精确度的基值的基值的基底缩缩缩变差结果。