The ability of likelihood-based probabilistic models to generalize to unseen data is central to many machine learning applications such as lossless compression. In this work, we study the generalizations of a popular class of probabilistic models - the Variational Auto-Encoder (VAE). We point out the two generalization gaps that can affect the generalization ability of VAEs and show that the over-fitting phenomenon is usually dominated by the amortized inference network. Based on this observation we propose a new training objective, inspired by the classic wake-sleep algorithm, to improve the generalizations properties of amortized inference. We also demonstrate how it can improve generalization performance in the context of image modeling and lossless compression.
翻译:基于概率的概率模型能否向看不见的数据推广,是许多机器学习应用的核心,例如无损压缩。在这项工作中,我们研究了流行的一类概率模型的概括性,即动态自动计算器(VAE),我们指出两种概括性差距,这些差距可能影响VAEs的概括性能力,并表明超适现象通常由摊销推理网络主导。基于这一观察,我们提出了一个新的培训目标,受经典的休眠算法的启发,以改善摊销推论的概括性特性。我们还展示了它如何能够在图像建模和无损压缩方面改进一般化表现。