Stochastic gradient methods have enabled variational inference for high-dimensional models and large data. However, the steepest ascent direction in the parameter space of a statistical model is given not by the commonly used Euclidean gradient, but the natural gradient which premultiplies the Euclidean gradient by the inverted Fisher information matrix. Use of natural gradients can improve convergence significantly, but inverting the Fisher information matrix is daunting in high-dimensions. In Gaussian variational approximation, natural gradient updates of the natural parameters (expressed in terms of the mean and precision matrix) of the Gaussian distribution can be derived analytically, but do not ensure the precision matrix remains positive definite. To tackle this issue, we consider Cholesky decomposition of the covariance or precision matrix and derive explicit natural gradient updates of the Cholesky factor, which depend only on the first instead of the second derivative of the log posterior density, by finding the inverse of the Fisher information matrix analytically. Efficient natural gradient updates of the Cholesky factor are also derived under sparsity constraints incorporating different posterior independence structures.
翻译:然而,统计模型参数空间的自然梯度最陡峭的方向不是由常用的欧几里德梯度给出的,而是由颠倒的渔业信息矩阵将欧几里德梯度提前乘以自然梯度的自然梯度给出的。使用自然梯度可以大大提高趋同程度,但渔民信息矩阵的逆向在高二分密度中是巨大的。在高斯变差近似值中,高斯分布的自然参数(以平均值和精确矩阵表示)的自然梯度的自然梯度更新可以通过分析得出,但不能确保精确矩阵保持肯定性。为了解决这一问题,我们认为Choolesky变异性或精度矩阵的分解,并得出Choolesky系数的明显自然梯度更新,该系数仅取决于第一个,而不是对日远密度的第二个衍生物,通过分析发现渔业信息矩阵的反面值。高斯分布系数的高效自然梯度更新也是在包含不同海面独立结构的孔径限制下得出的。