In this paper, we quantify the impact of using non-convergent Markov chains to train Energy-Based models (EBMs). In particular, we show analytically that EBMs trained with non-persistent short runs to estimate the gradient can perfectly reproduce a set of empirical statistics of the data, not at the level of the equilibrium measure, but through a precise dynamical process. Our results provide a first-principles explanation for the observations of recent works proposing the strategy of using short runs starting from random initial conditions as an efficient way to generate high-quality samples in EBMs, and lay the groundwork for using EBMs as diffusion models. After explaining this effect in generic EBMs, we analyze two solvable models in which the effect of the non-convergent sampling in the trained parameters can be described in detail. Finally, we test these predictions numerically on the Boltzmann machine.
翻译:在本文中,我们量化了使用非趋同的Markov链条来培训基于能源的模型(EBMs)的影响。特别是,我们从分析中显示,经过非持久性短径培训来估计梯度的EBMs完全可以复制一套数据的经验性统计数据,不是在均衡测量水平上,而是通过精确的动态过程。我们的结果为最近关于从随机初始条件开始使用短跑作为生成EBMs高质量样本的有效方法的观测工作提供了第一条原则的解释,并为使用EBMs作为扩散模型奠定了基础。在用通用的EBMs解释这一效果之后,我们分析了两种可溶解的模型,其中可以详细描述在经过训练的参数中非一致性取样的效果。最后,我们在Boltzmann机器上用数字测试这些预测。