We propose a theoretical approach towards the training numerical stability of Variational AutoEncoders (VAE). Our work is motivated by recent studies empowering VAEs to reach state of the art generative results on complex image datasets. These very deep VAE architectures, as well as VAEs using more complex output distributions, highlight a tendency to haphazardly produce high training gradients as well as NaN losses. The empirical fixes proposed to train them despite their limitations are neither fully theoretically grounded nor generally sufficient in practice. Building on this, we localize the source of the problem at the interface between the model's neural networks and their output probabilistic distributions. We explain a common source of instability stemming from an incautious formulation of the encoded Normal distribution's variance, and apply the same approach on other, less obvious sources. We show that by implementing small changes to the way we parameterize the Normal distributions on which they rely, VAEs can securely be trained.
翻译:我们提出了培训多动自动计算器(VAE)数字稳定性的理论方法。我们的工作动力在于最近进行的研究,这些研究授权VAEs在复杂的图像数据集中达到最先进的基因效果。这些非常深的VAE结构以及使用更复杂的输出分布的VAE结构,突出了无意产生高培训梯度以及NNN损失的倾向。尽管存在局限性,但提议培训这些梯度的经验性修正在理论上并不完全有根据,在实践中也普遍不够充分。在此基础上,我们将问题的根源定位在模型神经网络及其输出概率分布之间的界面上。我们解释了由编码正常分布差异的不谨慎的配方造成的不稳定的共同根源,对其他不太明显的来源也采用了同样的方法。我们表明,通过对正常分布的参数进行微小的调整,VAEs可以安全地接受培训。