The variational lower bound (a.k.a. ELBO or free energy) is the central objective for many established as well as many novel algorithms for unsupervised learning. Learning algorithms change model parameters such that the variational lower bound increases. Learning usually proceeds until parameters have converged to values close to a stationary point of the learning dynamics. In this purely theoretical contribution, we show that (for a very large class of generative models) the variational lower bound is at all stationary points of learning equal to a sum of entropies. For standard machine learning models with one set of latents and one set observed variables, the sum consists of three entropies: (A) the (average) entropy of the variational distributions, (B) the negative entropy of the model's prior distribution, and (C) the (expected) negative entropy of the observable distributions. The obtained result applies under realistic conditions including: finite numbers of data points, at any stationary points (including saddle points) and for any family of (well behaved) variational distributions. The class of generative models for which we show the equality to entropy sums contains many well-known generative models. As concrete examples we discuss Sigmoid Belief Networks, probabilistic PCA and (Gaussian and non-Gaussian) mixture models. The prerequisites we use to show equality to entropy sums are relatively mild. Concretely, the distributions of a given generative model have to be of the exponential family (with constant base measure), and the model has to satisfy a parameterization criterion (which is usually fulfilled). Proving the equality of the ELBO to entropy sums at stationary points (under the stated conditions) is the main contribution of this work.
翻译:变式下限( a. k. a. a. ELBO 或 免费能量) 是许多既定的以及许多未监督的学习的新奇算法的中心目标。 学习算法会改变模型参数, 使变式下限增加。 学习通常会持续到参数趋近于学习动态固定点的值。 在这个纯理论贡献中, 我们显示变式下限( 对于一个非常庞大的基因化模型来说) 变式下限是在所有固定的学习点, 等同于一个整数。 对于具有一组潜值和一组观察到的变量的标准机器学习模型来说, 总数由三种变式变式模型的模型参数组成:( A) ( 平均) 变式分布, ( B) 模型先前分布的负式, ( C) (预期) 可见分布的负式负式。 所获得的结果适用于现实条件下, 包括: 模型的直径等数, 任何固定点( 包括齿点) 和任何( 行为良好的) 变式分配模式, 等数, 数由三个变式的变式模型组成。 相对的变式的变式模型的基数组成基数( 通常的基数) 通常的基数 显示的变数 。