Energy-based models (EBMs) are generative models that are usually trained via maximum likelihood estimation. This approach becomes challenging in generic situations where the trained energy is non-convex, due to the need to sample the Gibbs distribution associated with this energy. Using general Fenchel duality results, we derive variational principles dual to maximum likelihood EBMs with shallow overparametrized neural network energies, both in the feature-learning and lazy linearized regimes. In the feature-learning regime, this dual formulation justifies using a two time-scale gradient ascent-descent (GDA) training algorithm in which one updates concurrently the particles in the sample space and the neurons in the parameter space of the energy. We also consider a variant of this algorithm in which the particles are sometimes restarted at random samples drawn from the data set, and show that performing these restarts at every iteration step corresponds to score matching training. These results are illustrated in simple numerical experiments, which indicates that GDA performs best when features and particles are updated using similar time scales.
翻译:以能源为基础的模型(EBMs)通常是通过最大可能性估计加以培训的基因模型。在经过培训的能源是非碳化的通用情况下,由于需要对这一能源的Gibs分布进行抽样,这一方法具有挑战性。我们利用一般的Fenchel 双重性结果,得出了在特征学习和懒惰线性系统中使用浅度超分神经网络能量的最大可能性的变式原则。在地貌学习制度中,这种双重配方说明有必要使用两种时间尺度梯度升温(GDA)培训算法,其中一种是同时更新样本空间中的粒子和能源参数空间中的神经元。我们还考虑了这种算法的一种变式,即有时在从数据集抽取的随机样本中重新启动这些粒子,并表明在每一个迭代阶段进行这些重现与匹配培训相符。这些结果在简单的数字实验中得到了说明,这表明GDA在利用相似的时间尺度更新特征和微粒子时表现最佳。