Due to the intractable partition function, training energy-based models (EBMs) by maximum likelihood requires Markov chain Monte Carlo (MCMC) sampling to approximate the gradient of the Kullback-Leibler divergence between data and model distributions. However, it is non-trivial to sample from an EBM because of the difficulty of mixing between modes. In this paper, we propose to learn a variational auto-encoder (VAE) to initialize the finite-step MCMC, such as Langevin dynamics that is derived from the energy function, for efficient amortized sampling of the EBM. With these amortized MCMC samples, the EBM can be trained by maximum likelihood, which follows an "analysis by synthesis" scheme; while the VAE learns from these MCMC samples via variational Bayes. We call this joint training algorithm the variational MCMC teaching, in which the VAE chases the EBM toward data distribution. We interpret the learning algorithm as a dynamic alternating projection in the context of information geometry. Our proposed models can generate samples comparable to GANs and EBMs. Additionally, we demonstrate that our model can learn effective probabilistic distribution toward supervised conditional learning tasks.
翻译:由于难解分区功能,培训以能源为基础的模型(EBMs)极有可能要求Markov 链链Monte Carlo(MCMC)取样,以估计数据和模型分布之间Kullback-Leeber差异的梯度。然而,由于各模式之间难以混合,EBM对EBM的采样不是三重的。在本文中,我们提议学习一个变式自动计算器(VAE),以启动有限的MMC(MC),如由能源功能产生的Langevin动态,以有效进行EBM的摊合采样。有了这些摊合式的MCC样本,EBM可以按照“综合分析”办法,以最大可能性对EBM进行培训;虽然VAE通过这些模式通过变式湾从这些MMC样本中学习。我们称这种联合培训算法是变式MCC(VAE)教学,其中VAE将E追踪EBM(E)数据分布。我们把学习算法解释为在信息几何测量方面进行动态交替预测。我们提议的模型可以与GANs和EBMLLA学会进行有效的学习。