In this paper, we introduce a new form of amortized variational inference by using the forward KL divergence in a joint-contrastive variational loss. The resulting forward amortized variational inference is a likelihood-free method as its gradient can be sampled without bias and without requiring any evaluation of either the model joint distribution or its derivatives. We prove that our new variational loss is optimized by the exact posterior marginals in the fully factorized mean-field approximation, a property that is not shared with the more conventional reverse KL inference. Furthermore, we show that forward amortized inference can be easily marginalized over large families of latent variables in order to obtain a marginalized variational posterior. We consider two examples of variational marginalization. In our first example we train a Bayesian forecaster for predicting a simplified chaotic model of atmospheric convection. In the second example we train an amortized variational approximation of a Bayesian optimal classifier by marginalizing over the model space. The result is a powerful meta-classification network that can solve arbitrary classification problems without further training.
翻译:在本文中,我们引入了一种新的摊销变异推断形式,方法是使用远端 KL 的差分来计算共交变异损失。 由此产生的远端摊销变异推断是一种没有可能性的方法, 因为它的梯度可以不加偏差地取样, 无需对模型联合分布或其衍生物作任何评估。 我们证明, 我们新的变异计算损失是通过完全系数平均场近比中精确的后边线优化的。 这种财产与较常规的反向 KL 误差不相共享。 此外, 我们还表明, 远端摊销变异可以很容易地被排挤在潜在变异的大型家庭之外, 以便获得边缘化变异的后方。 我们考虑两个变异性边缘化的例子。 在第一个例子中, 我们训练了贝叶斯预报员预测大气凝聚的简化混乱模型。 在第二个例子中, 我们通过在模型空间上边缘化, 来训练一种巴耶斯最佳变异近。 结果是一个强大的元分类网络, 可以解决任意分类问题, 而无需进一步培训。