Variational inference is a popular alternative to Markov chain Monte Carlo methods that constructs a Bayesian posterior approximation by minimizing a discrepancy to the true posterior within a pre-specified family. This converts Bayesian inference into an optimization problem, enabling the use of simple and scalable stochastic optimization algorithms. However, a key limitation of variational inference is that the optimal approximation is typically not tractable to compute; even in simple settings the problem is nonconvex. Thus, recently developed statistical guarantees -- which all involve the (data) asymptotic properties of the optimal variational distribution -- are not reliably obtained in practice. In this work, we provide two major contributions: a theoretical analysis of the asymptotic convexity properties of variational inference in the popular setting with a Gaussian family; and consistent stochastic variational inference (CSVI), an algorithm that exploits these properties to find the optimal approximation in the asymptotic regime. CSVI consists of a tractable initialization procedure that finds the local basin of the optimal solution, and a scaled gradient descent algorithm that stays locally confined to that basin. Experiments on nonconvex synthetic and real-data examples show that compared with standard stochastic gradient descent, CSVI improves the likelihood of obtaining the globally optimal posterior approximation.
翻译:变相推论是Markov 链条 Monte Carlo 方法的流行替代物,该方法通过尽量减少与预定家庭内部真实的后子体的差异,构建了巴伊西亚的后子体近似近似值,从而最大限度地缩小了与预定家庭内部真实后子体的差异,从而将巴伊西亚的推论转化为一个优化问题,从而能够使用简单和可缩放的随机切片优化优化算法。然而,变式推论的一个关键局限性是,最佳近似通常无法进行计算;即便在简单环境下,问题也是非对的。因此,最近开发的统计保障 -- -- 都涉及最佳变异分布的(数据)无现性特性 -- -- 在实践中并非可靠地获得。在这项工作中,我们提供了两个主要贡献:对大众环境中与一个高斯氏家族的变异性变异性变异性变性特征的无症状性能分析进行理论分析;以及连贯一致的相近性变异性推论(CVI),一种利用这些特性在亚州制度内寻找最佳近似近似近似近似近似近似近似近地点的近似性。SVI包含一个可感初始化初始化初始化程序,以找到最佳的当地正比比比地基化的地基系,并展示了最佳地基系的比地基底系,并展示了最佳地基底系,并展示了最佳地基系。