Gradient estimation is often necessary for fitting generative models with discrete latent variables, in contexts such as reinforcement learning and variational autoencoder (VAE) training. The DisARM estimator (Yin et al. 2020; Dong, Mnih, and Tucker 2020) achieves state of the art gradient variance for Bernoulli latent variable models in many contexts. However, DisARM and other estimators have potentially exploding variance near the boundary of the parameter space, where solutions tend to lie. To ameliorate this issue, we propose a new gradient estimator \textit{bitflip}-1 that has lower variance at the boundaries of the parameter space. As bitflip-1 has complementary properties to existing estimators, we introduce an aggregated estimator, \textit{unbiased gradient variance clipping} (UGC) that uses either a bitflip-1 or a DisARM gradient update for each coordinate. We theoretically prove that UGC has uniformly lower variance than DisARM. Empirically, we observe that UGC achieves the optimal value of the optimization objectives in toy experiments, discrete VAE training, and in a best subset selection problem.
翻译:在强化学习和变异自动编码器(VAE)培训等情况下,为了在强化学习和变异自动编码器(VAE)培训等背景中将隐性变量与离散潜伏变量相匹配,往往有必要进行梯度估计。Disarm和其他估计值对于在多种情况下Bernoulli潜伏变量模型达到最先进的梯度差异。然而,Disarm和其他估计值有可能在参数空间边界附近发生爆炸性差异,而解决办法往往就在于此。为了改善这一问题,我们提议一个新的梯度估计值 \ textit{bitflip}-1, 参数空间边界差异较小。由于位滑鼠-1与现有估计值具有互补的特性,我们引入一个综合估计值,\textit{无偏差梯度差异剪裁}(UGC),它使用位滑动-1或扰动梯度梯度更新来进行每次协调。我们理论上证明,UGC的偏差均低于DisARM。我们从中观察到,UGC在初始实验、离子选择问题中实现了最佳的优化目标值。