Stochastic gradient methods have enabled variational inference for high-dimensional models and large data sets. However, the direction of steepest ascent in the parameter space of a statistical model is given not by the commonly used Euclidean gradient, but the natural gradient which premultiplies the Euclidean gradient by the inverse of the Fisher information matrix. Use of natural gradients in optimization can improve convergence significantly, but inverting the Fisher information matrix is daunting in high-dimensions. The contribution of this article is twofold. First, we derive the natural gradient updates of a Gaussian variational approximation in terms of the mean and Cholesky factor of the covariance matrix, and show that these updates depend only on the first derivative of the variational objective function. Second, we derive complete natural gradient updates for structured variational approximations with a minimal conditional exponential family representation, which include highly flexible mixture of exponential family distributions that can fit skewed or multimodal posteriors. These updates, albeit more complex than those presented priorly, account fully for the dependence between the mixing distribution and the distributions of the components. Further experiments will be carried out to evaluate the performance of proposed methods.
翻译:然而,统计模型参数空间中最陡峭的梯度方向并非由常用的欧几里德梯度所给出,而是由渔业信息矩阵相反地将欧几里德梯度乘以自然梯度的自然梯度所给出。在优化时使用自然梯度可以显著地提高汇合,但将渔业信息矩阵倒转,在高二分制中是巨大的。本文章的贡献是双重的。首先,我们从常用的欧几里德梯度中得出高斯变差近似系数的自然梯度更新,并显示这些更新仅取决于变异目标函数的第一个衍生物。第二,我们为结构变异近似得出完整的自然梯度更新,以最小的有条件直线式家庭代表为最小,包括高度灵活的指数式家庭分布混合物,适合斜度或多式远距子体。这些更新尽管比先前介绍的要复杂得多,但将充分计算混合分布和组件分布之间的依赖性。将进行进一步的实验,以便评估拟议的方法。