Training models with discrete latent variables is challenging due to the high variance of unbiased gradient estimators. While low-variance reparameterization gradients of a continuous relaxation can provide an effective solution, a continuous relaxation is not always available or tractable. Dong et al. (2020) and Yin et al. (2020) introduced a performant estimator that does not rely on continuous relaxations; however, it is limited to binary random variables. We introduce a novel derivation of their estimator based on importance sampling and statistical couplings, which we extend to the categorical setting. Motivated by the construction of a stick-breaking coupling, we introduce gradient estimators based on reparameterizing categorical variables as sequences of binary variables and Rao-Blackwellization. In systematic experiments, we show that our proposed categorical gradient estimators provide state-of-the-art performance, whereas even with additional Rao-Blackwellization, previous estimators (Yin et al., 2019) underperform a simpler REINFORCE with a leave-one-out-baseline estimator (Kool et al., 2019).
翻译:由于不偏向梯度估计值差异很大,具有离散潜伏变量的培训模式具有挑战性。虽然持续放松的低差再量化梯度可以提供有效的解决方案,但持续放松并不总是可用或可移动的。Dong等人(2020年)和Yin等人(202020年)引入了不依赖连续放松的性能估计仪;然而,它仅限于二进制随机变量。我们引入了基于重要抽样和统计组合的新颖估计器,我们将其扩展到绝对环境。我们借助于建造一个破碎的组合,我们引入了基于对绝对变量进行再量化的梯度估计器,作为二进制变量和彩虹-黑化的序列。在系统实验中,我们显示我们提议的绝对梯度估计器提供了最新性性性能,而即使增加了Rao-Blackwelliz,以前的估计器(Yin等人,2019年)也低于一个简单的REINFORCE,以离线标值为基础的基线(Kol等人,19年)。