Stochastic gradient methods has enabled variational inference for high-dimensional models and large data sets. However, the direction of steepest ascent in the parameter space of a statistical model is not given by the commonly used Euclidean gradient, but the natural gradient which premultiplies the Euclidean gradient by the inverse of the Fisher information matrix. Use of natural gradients in optimization can improve convergence significantly, but inverting the Fisher information matrix is daunting in high-dimensions. Here we consider structured variational approximations with a minimal conditional exponential family representation, which include highly flexible mixtures of exponential family distributions that can fit skewed or multimodal posteriors. We derive complete natural gradient updates for this class of models, which albeit more complex than the natural gradient updates presented prior to this article, account fully for the dependence between the mixing distribution and the distributions of the components. Further experiments will be carried out to evaluate the performance of the complete natural gradient updates.
翻译:高维模型和大型数据集的惯性梯度方法使高维模型和大型数据集的变异推导力得以实现。然而,通常使用的欧clidean梯度并没有给出统计模型参数空间最陡峭的升幅方向,而是将欧clidean梯度乘以Fisher信息矩阵反面的自然梯度。在优化时使用自然梯度可以大大改善趋同,但在高二分化中使渔业信息矩阵反转非常困难。在这里,我们考虑结构化的变异近似值,以最小的有条件指数式家庭表示,其中包括极灵活的指数式家庭分布混合物,可以适合斜度或多式远地点。我们为这一类模型制作完整的自然梯度更新数据,尽管这比本篇文章前的自然梯度更新更为复杂,但充分考虑到混合分布和组件分布之间的依赖性。将进行进一步实验,以评价整个自然梯度更新的性能。