We study the foundations of variational inference, which frames posterior inference as an optimisation problem, for probabilistic programming. The dominant approach for optimisation in practice is stochastic gradient descent. In particular, a variant using the so-called reparameterisation gradient estimator exhibits fast convergence in a traditional statistics setting. Unfortunately, discontinuities, which are readily expressible in programming languages, can compromise the correctness of this approach. We consider a simple (higher-order, probabilistic) programming language with conditionals, and we endow our language with both a measurable and a smoothed (approximate) value semantics. We present type systems which establish technical pre-conditions. Thus we can prove stochastic gradient descent with the reparameterisation gradient estimator to be correct when applied to the smoothed problem. Besides, we can solve the original problem up to any error tolerance by choosing an accuracy coefficient suitably. Empirically we demonstrate that our approach has a similar convergence as a key competitor, but is simpler, faster, and attains orders of magnitude reduction in work-normalised variance.
翻译:我们研究变异推论的基础,它把事后推论作为最优化的问题,用于概率性编程。在实际中,优化的主导方法是随机梯度梯度下降。特别是,使用所谓的重新校准梯度估计值的变量在传统统计环境中显示出快速的趋同。不幸的是,在编程语言中易于表达的不连续会损害这种方法的正确性。我们认为一种简单的(高排序、概率)编程语言具有条件性,我们用一种可测量的和平稳的(近似)值语义来赋予我们语言。我们展示了建立技术预设条件的类型系统。因此,我们可以证明,在对平滑问题应用重新校准梯度估计值时,能够准确校准梯度梯度下降值。此外,我们可以选择一个精确系数来解决任何错误容忍性的问题。我们创造性地证明,我们的方法作为关键的比较方位具有相似的趋同性,但更简单、更快、更快速和达到工作变差的降幅。