Hierarchical VAEs have emerged in recent years as a reliable option for maximum likelihood estimation. However, instability issues and demanding computational requirements have hindered research progress in the area. We present simple modifications to the Very Deep VAE to make it converge up to $2.6\times$ faster, save up to $20\times$ in memory load and improve stability during training. Despite these changes, our models achieve comparable or better negative log-likelihood performance than current state-of-the-art models on all $7$ commonly used image datasets we evaluated on. We also make an argument against using 5-bit benchmarks as a way to measure hierarchical VAE's performance due to undesirable biases caused by the 5-bit quantization. Additionally, we empirically demonstrate that roughly $3\%$ of the hierarchical VAE's latent space dimensions is sufficient to encode most of the image information, without loss of performance, opening up the doors to efficiently leverage the hierarchical VAEs' latent space in downstream tasks. We release our source code and models at https://github.com/Rayhane-mamah/Efficient-VDVAE .
翻译:近些年来,定级VAE作为最大可能估计的可靠选项出现。然而,不稳定问题和严格的计算要求阻碍了这一领域的研究进展。我们简单修改VAE深层VAE, 使其更快地达到26美元,在记忆负荷中节省高达20美元,并在培训期间提高稳定性。尽管发生了这些变化,但我们的模型在所评估的所有7美元通用图像数据集方面都取得了可比或更好的负日志相似性表现。我们还提出一个论点,反对使用5比特基准作为衡量VAE等级因5比特的夸度造成的不良偏见而导致的绩效的方法。此外,我们从经验上表明,VAE等级的潜在空间维度约3美元足以在不丧失性能的情况下对大多数图像信息进行编码,打开门户,以便在下游任务中高效地利用VAE的等级隐蔽空间。我们在 https://github.com/ Rayhane-mama/Effif-VAVAE任务中发布了我们的源代码和模型。