Stochastic gradient methods have enabled variational inference for high-dimensional models and large data. However, the steepest ascent direction in the parameter space of a statistical model is given not by the commonly used Euclidean gradient, but the natural gradient which premultiplies the Euclidean gradient by the inverted Fisher information matrix. Use of natural gradients can improve convergence significantly, but inverting the Fisher information matrix is daunting in high-dimensions. In Gaussian variational approximation, natural gradient updates of the natural parameters (expressed in terms of the mean and precision matrix) of the Gaussian distribution can be derived analytically, but do not ensure the precision matrix remains positive definite. To tackle this issue, we consider Cholesky decomposition of the covariance or precision matrix and derive explicit natural gradient updates of the Cholesky factor by finding the inverse of the Fisher information matrix analytically. Natural gradient updates of the Cholesky factor as compared to natural parameters, depend only on the first instead of the second derivative of the log posterior density and reduces computational cost. Sparsity constraints incorporating posterior independence structure can be imposed by fixing relevant entries in the Cholesky factor to zero.
翻译:但是,统计模型参数空间的自然梯度的自然梯度更新(以平均值和精确度矩阵表示)可以通过分析方式得出,但不能确保精确矩阵保持肯定性。为了解决这一问题,我们认为Cholesky变异性或精确矩阵的变异性,通过发现渔业信息矩阵的反面分析,得出Cholesky系数的明显自然梯度更新。 与自然参数相比,Choolesky系数与自然参数的自然梯度更新只能取决于日志后端密度的第二个衍生物,并降低计算成本。