反时代反反对游戏 (Contrastive Divergence Learning is a Time Reversal Adversarial Game)

Contrastive divergence (CD) learning is a classical method for fitting unnormalized statistical models to data samples. Despite its wide-spread use, the convergence properties of this algorithm are still not well understood. The main source of difficulty is an unjustified approximation which has been used to derive the gradient of the loss. In this paper, we present an alternative derivation of CD that does not require any approximation and sheds new light on the objective that is actually being optimized by the algorithm. Specifically, we show that CD is an adversarial learning procedure, where a discriminator attempts to classify whether a Markov chain generated from the model has been time-reversed. Thus, although predating generative adversarial networks (GANs) by more than a decade, CD is, in fact, closely related to these techniques. Our derivation settles well with previous observations, which have concluded that CD's update steps cannot be expressed as the gradients of any fixed objective function. In addition, as a byproduct, our derivation reveals a simple correction that can be used as an alternative to Metropolis-Hastings rejection, which is required when the underlying Markov chain is inexact (\eg when using Langevin dynamics with a large step).

翻译：对比差异( CD) 学习是将非标准化统计模型与数据样本相匹配的典型方法。尽管这种算法的使用范围很广, 但这种算法的趋同特性仍然没有得到很好地理解。主要的困难来源是用来得出损失梯度的不合理近似值。在本文中, 我们提出了一张不要求任何近似值的CD的替代衍生法, 并给正在实际由算法优化的目标提供了新的亮点。具体地说, 我们显示, CD是一种对抗性学习程序, 歧视者试图将该模型生成的Markov链条分类为是否已经被时间反转的。因此, 尽管在十多年的时间里预设了基因对抗网络( GANs ), 但CD实际上是与这些技术密切相关的。我们的推断与先前的观察很接近, 这些观察认为, CD的更新步骤不能作为任何固定目标函数的梯度表示。此外, 作为副产品, 我们的推算表明, 一种简单的修正可以用来作为Metopolis- Hastings refer 的替代方法, 当基点链与Gang 使用大动作时需要时, 。