Although with progress in introducing auxiliary amortized inference models, learning discrete latent variable models is still challenging. In this paper, we show that the annoying difficulty of obtaining reliable stochastic gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed in a new method based on stochastic approximation (SA) theory of the Robbins-Monro type. Specifically, we propose to directly maximize the target log-likelihood and simultaneously minimize the inclusive divergence between the posterior and the inference model. The resulting learning algorithm is called joint SA (JSA). To the best of our knowledge, JSA represents the first method that couples an SA version of the EM (expectation-maximization) algorithm (SAEM) with an adaptive MCMC procedure. Experiments on several benchmark generative modeling and structured prediction tasks show that JSA consistently outperforms recent competitive algorithms, with faster convergence, better final likelihoods, and lower variance of gradient estimates.
翻译:尽管在采用辅助分解推断模型方面取得了进展,但学习离散潜伏变量模型仍具有挑战性。在本文中,我们表明,难以为推论模型获得可靠的随机梯度以及间接优化目标日志相似性的缺陷,这令人烦恼,可以通过基于Robbins-Monro类型的随机近似理论(SA)的新方法,优雅地加以解决。具体地说,我们提议直接最大限度地扩大目标日志相似性,同时尽量减少后继模型和推论模型之间的包容性差异。由此产生的学习算法被称为联合SA(JSA)。据我们所知,JSA代表了一种方法,即将EM(预期-氧化)算法的SAA版本与适应性MCMC程序结合在一起。关于若干基准的基因模型和结构化预测任务的实验表明,JSA始终超越了最近的竞争性算法,而更快的趋同性趋同、更接近的最终可能性以及梯度估计的更低差异。