Estimating the gradients for binary variables is a task that arises frequently in various domains, such as training discrete latent variable models. What has been commonly used is a REINFORCE based Monte Carlo estimation method that uses either independent samples or pairs of negatively correlated samples. To better utilize more than two samples, we propose ARMS, an Antithetic REINFORCE-based Multi-Sample gradient estimator. ARMS uses a copula to generate any number of mutually antithetic samples. It is unbiased, has low variance, and generalizes both DisARM, which we show to be ARMS with two samples, and the leave-one-out REINFORCE (LOORF) estimator, which is ARMS with uncorrelated samples. We evaluate ARMS on several datasets for training generative models, and our experimental results show that it outperforms competing methods. We also develop a version of ARMS for optimizing the multi-sample variational bound, and show that it outperforms both VIMCO and DisARM. The code is publicly available.
翻译:估算二进制变量的梯度是不同领域经常出现的一项任务,例如培训离散潜伏变量模型。通常使用的是一种基于REINFORCE的蒙特卡洛估算方法,这种方法使用独立样品或相对的负相关样品。为了更好地利用两个以上的样品,我们提议使用ARMS,一个基于抗反再生环境的多样梯度估测器。ARMS使用一个相对抗性样本来生成任何数量的相对抗性样本。它没有偏见,差异较小,并笼统地将我们显示为有两种样品的ARMS和无相干样品的REINFORCE(LOOORF)离岸估计仪(LOORF)和无相干样品的ERMS(LOORCE)估计器。我们用若干数据集来评估ARMS,用于培训基因模型,我们的实验结果显示它优于相互竞争的方法。我们还开发了一个版本的ARMS,以优化多模变装,并显示它优于VIMCO和DARM。代码是公开的。