Stochastic gradient methods have enabled variational inference for high-dimensional models. However, the steepest ascent direction in the parameter space of a statistical model is actually given by the natural gradient which premultiplies the widely used Euclidean gradient by the inverse Fisher information. Use of natural gradients can improve convergence, but inverting the Fisher information matrix is daunting in high-dimensions. In Gaussian variational approximation, natural gradient updates of the mean and precision of the normal distribution can be derived analytically, but do not ensure that the precision matrix remains positive definite. To tackle this issue, we consider Cholesky decomposition of the covariance or precision matrix, and derive analytic natural gradient updates of the Cholesky factor, which depend on either the first or second derivative of the log posterior density. Efficient natural gradient updates of the Cholesky factor are also derived under sparsity constraints representing different posterior correlation structures. As Adam's adaptive learning rate does not work well with natural gradients, we propose stochastic normalized natural gradient ascent with momentum. The efficiency of proposed methods are demonstrated using logistic regression and generalized linear mixed models.
翻译:然而,统计模型参数空间的自然梯度实际给出了统计模型参数空间的最陡度方向。 自然梯度使反渔业者信息广泛使用的欧几里德梯度成倍增加。 使用自然梯度可以改善趋同, 但将渔业信息矩阵颠倒在高二进制中是巨大的。 在高斯变差近似中,正常分布平均值和精确度的自然梯度更新可以通过分析得出,但不能确保精确矩阵保持肯定性。 为了解决这个问题,我们考虑Choolesky变异或精确矩阵的分解,并得出Cholesky系数的解析性自然梯度更新,这取决于日志远地点密度的第一次或第二次衍生物。 Choolesky系数的高效自然梯度更新也是在代表不同后方相关结构的摄测性制约下产生的。 由于Adam的适应性学习率与自然梯度不起作用,因此我们建议用恒定的自然梯度变异性自然梯度作为动力,我们建议用平坦性自然梯度的自然梯度作为动力,并用平化的正态模型演示式分析。