Contrastive divergence (CD) learning is a classical method for fitting unnormalized statistical models to data samples. Despite its wide-spread use, the convergence properties of this algorithm are still not well understood. The main source of difficulty is an unjustified approximation which has been used to derive the gradient of the loss. In this paper, we present an alternative derivation of CD that does not require any approximation and sheds new light on the objective that is actually being optimized by the algorithm. Specifically, we show that CD is an adversarial learning procedure, where a discriminator attempts to classify whether a Markov chain generated from the model has been time-reversed. Thus, although predating generative adversarial networks (GANs) by more than a decade, CD is, in fact, closely related to these techniques. Our derivation settles well with previous observations, which have concluded that CD's update steps cannot be expressed as the gradients of any fixed objective function. In addition, as a byproduct, our derivation reveals a simple correction that can be used as an alternative to Metropolis-Hastings rejection, which is required when the underlying Markov chain is inexact (e.g. when using Langevin dynamics with a large step).
翻译:对比差异( CD) 学习是将非标准化的统计模型与数据样本相匹配的典型方法。 尽管这种算法的使用范围很广, 但这种算法的趋同特性仍然没有得到很好地理解。 主要的困难来源是用来得出损失梯度的不合理近似值。 在本文中, 我们提出了一张不要求任何近似值的CD的替代衍生法, 并给正在实际由算法优化的目标提供了新的亮点。 具体地说, 我们显示, CD是一个对抗性学习程序, 歧视者试图将从模型中生成的马尔科夫链条分类, 是否有时间反差。 因此, 尽管在十多年的时间里预设了基因对抗网络( GANs ), 但CD实际上与这些技术密切相关。 我们的推断与先前的观察非常接近, 这些观察认为, CD的更新步骤不能以任何固定目标函数的梯度表示。 此外, 作为副产品, 我们的推算表明, 一种简单的修正可以用来作为Metopolis- Hastings 拒绝的替代方法。 当基质链与Gang 使用大动作时需要时, 。