Stochastic gradient methods have enabled variational inference for high-dimensional models and large data sets. However, the direction of steepest ascent in the parameter space of a statistical model is given not by the commonly used Euclidean gradient, but the natural gradient which premultiplies the Euclidean gradient by the inverse of the Fisher information matrix. Use of natural gradients in optimization can improve convergence significantly, but inverting the Fisher information matrix is daunting in high-dimensions. The contribution of this article is twofold. First, we derive the natural gradient updates of a Gaussian variational approximation in terms of the mean and Cholesky factor of the covariance matrix, and show that these updates depend only on the first derivative of the variational objective function. Second, we provide detailed derivation of the natural gradient updates for structured variational approximations with a minimal conditional exponential family representation, which include highly flexible mixture of exponential family distributions that can fit skewed or multimodal posteriors. Further experiments will be carried out to evaluate the performance of proposed methods.
翻译:然而,统计模型参数空间中最陡峭的梯度方向并非由常用的欧几里德梯度所给出,而是由法希尔信息矩阵相反使欧几里德梯度倍增的自然梯度所给出。在优化时使用自然梯度可以显著地提高趋同程度,但将渔业信息矩阵颠倒在高度二分制中是巨大的。本文章的贡献是双重的。首先,我们从共变矩阵的平均值和空基系数的角度得出高斯变差近的自然梯度更新,并表明这些更新仅取决于变异目标函数的第一个衍生物。第二,我们详细介绍了自然梯度更新的结构变异近似值的衍生物,以最小的有条件指数式家庭代表性,包括高度灵活的指数式家庭分布混合物,可以适合扭曲或多式后方的。将进行进一步实验,以评价拟议方法的性能。