Normalizing flows, autoregressive models, variational autoencoders (VAEs), and deep energy-based models are among competing likelihood-based frameworks for deep generative learning. Among them, VAEs have the advantage of fast and tractable sampling and easy-to-access encoding networks. However, they are currently outperformed by other models such as normalizing flows and autoregressive models. While the majority of the research in VAEs is focused on the statistical challenges, we explore the orthogonal direction of carefully designing neural architectures for hierarchical VAEs. We propose Nouveau VAE (NVAE), a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization. NVAE is equipped with a residual parameterization of Normal distributions and its training is stabilized by spectral regularization. We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, CelebA 64, and CelebA HQ datasets and it provides a strong baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2.98 to 2.91 bits per dimension, and it produces high-quality images on CelebA HQ. To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256$\times$256 pixels. The source code is available at https://github.com/NVlabs/NVAE .
翻译:普通化流、自动递进模型、自动递进模型和深能源模型是深层基因学习的相互竞争的可能性基础框架。其中,VAE具有快速和可移动的抽样和容易进入的编码网络的优势。然而,它们目前被正常流和自动递进模型等其他模型所表现的优于其他模型。虽然VAE的大多数研究侧重于统计挑战,但我们探索了仔细设计用于等级VAE的神经结构的正向方向。我们提议了Nuveau VAE(NVAE),为利用深度分解变异和分批正常化的图像生成而创建的高级高级VAE。NVAE装备了正常分布的剩余参数,其培训通过光谱调节得到稳定。我们显示,NVAE在MIS、CIFAR-10、CelibA 64和CeebA HQ(HQ)首次应用的高级级VAEAE级图像上取得了最新的最新结果。在二号AFA-NAFAL-BS上,它提供了一个强大的原始基础数据模型。