The variational lower bound (a.k.a. ELBO or free energy) is the central objective for many established as well as many novel algorithms for unsupervised learning. Learning algorithms change model parameters such that the variational lower bound increases. Learning usually proceeds until parameters have converged to values close to a stationary point of the learning dynamics. In this purely theoretical contribution, we show that (for a very large class of generative models) the variational lower bound is at all stationary points of learning equal to a sum of entropies. For standard machine learning models with one set of latents and one set observed variables, the sum consists of three entropies: (A) the (average) entropy of the variational distributions, (B) the negative entropy of the model's prior distribution, and (C) the (expected) negative entropy of the observable distributions. The obtained result applies under realistic conditions including: finite numbers of data points, at any stationary points (including saddle points) and for any family of (well behaved) variational distributions. The class of generative models for which we show the equality to entropy sums contains many well-known generative models. As concrete examples we discuss Sigmoid Belief Networks, probabilistic PCA and (Gaussian and non-Gaussian) mixture models. The results also apply for standard (Gaussian) variational autoencoders, which has been shown in parallel (Damm et al., 2023). The prerequisites we use to show equality to entropy sums are relatively mild. Concretely, the distributions of a given generative model have to be of the exponential family (with constant base measure), and the model has to satisfy a parameterization criterion (which is usually fulfilled). Proving the equality of the ELBO to entropy sums at stationary points (under the stated conditions) is the main contribution of this work.
翻译:变分下界(ELBO或自由能)是无监督学习中许多经典和新颖算法的核心目标。学习算法会改变模型参数,使得变分下界增加。通常,学习会一直进行,直到参数收敛到学习动态的一个静止点附近。在这项纯理论贡献中,我们表明(针对一个非常广泛的生成模型类),变分下界在所有静态点上都等于熵和。对于具有一个潜变量集和一个观察变量集的标准机器学习模型,该和由三个熵组成:(A) 变分分布的(平均)熵,(B) 模型的先验分布的负熵,以及(C) 观察分布的(期望)负熵。得到的结果适用于包括有限数据点、任何静态点(包括鞍点)和任何(良好行为的)变分分布族的现实条件。我们展示的生成模型类包含许多著名的生成模型。具体来说,我们讨论了Sigmoid Belief Networks、概率PCA和(高斯和非高斯)混合模型。这些结果也适用于标准的(高斯)变分自编码器,在另一篇文章的平行研究中已经得到证明(Damm等人,2023)。我们用来表明和熵和相等的先决条件是相对温和的。具体而言,给定生成模型的分布必须是指数家族(具有常数底度量),并且模型必须满足参数化准则(通常满足)。在静态点(在所述条件下)证明ELBO与熵和相等是这项工作的主要贡献。