Bayesian inference is a popular method to build learning algorithms but it is hampered by the fact that its key object, the posterior probability distribution, is often uncomputable. Expectation Propagation (EP) (Minka (2001)) is a popular algorithm that solves this issue by computing a parametric approximation (e.g: Gaussian) to the density of the posterior. However, while it is known empirically to quickly compute fine approximations, EP is extremely poorly understood which prevents it from being adopted by a larger fraction of the community. The object of the present article is to shed intuitive light on EP, by relating it to other better understood methods. More precisely, we link it to using gradient descent to compute the Laplace approximation of a target probability distribution. We show that EP is exactly equivalent to performing gradient descent on a smoothed energy landscape: i.e: the original energy landscape convoluted with some smoothing kernel. This also relates EP to algorithms that compute the Gaussian approximation which minimizes the reverse KL divergence to the target distribution, a link that has been conjectured before but has not been proved rigorously yet. These results can help practitioners to get a better feel for how EP works, as well as lead to other new results on this important method.
翻译:Bayesian 推论是建立学习算法的流行方法,但它受到以下事实的阻碍:它的关键对象,即事后概率分布,往往无法计算。期望推进(EP) (Minka (2001)) 是一种解决该问题的流行算法,它计算出离子体密度的参数近似值(例如:Gaussian) 。然而,尽管人们从经验上知道它快速计算精细近度,但EP极不易被社区更大部分采纳。本文章的目的是通过将其与其他更了解的方法联系起来,在EP上直观地显示光线。更准确地说,我们把它与使用梯度下降来计算目标概率分布的拉普尔近近值(例如:高斯) 是一个很受欢迎的算法, 也就是说, 原始能源景观与一些平滑的内核相融合, 也很难被人们理解。 EP与计算高标的直线使反向偏差最小化, 从而能够将这种逆向KL偏差与目标分布结果联系起来。我们证明,这些推算法是更精确的。