Contrastive divergence is a popular method of training energy-based models, but is known to have difficulties with training stability. We propose an adaptation to improve contrastive divergence training by scrutinizing a gradient term that is difficult to calculate and is often left out for convenience. We show that this gradient term is numerically significant and in practice is important to avoid training instabilities, while being tractable to estimate. We further highlight how data augmentation and multi-scale processing can be used to improve model robustness and generation quality. Finally, we empirically evaluate stability of model architectures and show improved performance on a host of benchmarks and use cases,such as image generation, OOD detection, and compositional generation.
翻译:差异是培训基于能源的模式的流行方法,但众所周知,在培训稳定性方面有困难。我们建议通过仔细审查一个难以计算而且往往被忽略以方便的梯度术语来改进对比差异的培训。我们表明,这个梯度术语在数字上意义重大,在实践中对于避免培训不稳定性十分重要,同时可以进行估算。我们进一步强调如何利用数据扩增和多尺度处理来提高模型的稳健性和生成质量。最后,我们从经验上评估模型结构的稳定性,并显示在一系列基准和使用案例中,例如图像生成、OOD检测和生成成份等,业绩有所改善。