Stochastic gradient-based optimisation for discrete latent variable models is challenging due to the high variance of gradients. We introduce a variance reduction technique for score function estimators that makes use of double control variates. These control variates act on top of a main control variate, and try to further reduce the variance of the overall estimator. We develop a double control variate for the REINFORCE leave-one-out estimator using Taylor expansions. For training discrete latent variable models, such as variational autoencoders with binary latent variables, our approach adds no extra computational cost compared to standard training with the REINFORCE leave-one-out estimator. We apply our method to challenging high-dimensional toy examples and training variational autoencoders with binary latent variables. We show that our estimator can have lower variance compared to other state-of-the-art estimators.
翻译:由于梯度差异很大,对离散潜伏变量模型采用基于梯度的惯性优化具有挑战性。我们对使用双重控制变异的计分函数估计器采用了差异减少技术。这些控制变异作用在主控变量的顶部,并试图进一步缩小总体估量器的差异。我们开发了使用泰勒扩展的REINFORCE 离任单一次估计器的双重控制变异功能。对于培训离散潜伏变量模型,例如具有二元潜伏变量的变异自动调整器,我们的方法与REINFORCE 留任一出一空的估测算器的标准培训相比,没有增加额外的计算成本。我们运用了方法来挑战高维微积分示例,并用二元潜伏变量培训变异的自动变异器。我们显示,我们的测算器与其他状态的估测算器相比,其差异较低。