Variational autoencoders (VAEs) are powerful deep generative models widely used to represent high-dimensional complex data through a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, the input data vectors are processed independently. Recently, a series of papers have presented different extensions of the VAE to process sequential data, which model not only the latent space but also the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks or state-space models. In this paper, we perform a literature review of these models. We introduce and discuss a general class of models, called dynamical variational autoencoders (DVAEs), which encompasses a large subset of these temporal VAE extensions. Then, we present in detail seven recently proposed DVAE models, with an aim to homogenize the notations and presentation lines, as well as to relate these models with existing classical temporal models. We have reimplemented those seven DVAE models and present the results of an experimental benchmark conducted on the speech analysis-resynthesis task (the PyTorch code is made publicly available). The paper concludes with a discussion on important issues concerning the DVAE class of models and future research guidelines.
翻译:挥发性自动电解码器(VAE)是强大的深深层遗传模型,广泛用于通过不受监督的低维潜层空间来代表高维的复杂数据。在原VAE模型中,输入数据矢量是独立处理的。最近,一系列论文展示了VAE的不同扩展,用于处理连续数据,不仅在数据矢量和相应潜在矢量的序列中模拟潜在空间,而且时间依赖数据矢量和相应潜在矢量的序列中模拟。在本文中,我们对这些模型进行了文献审查。我们介绍并讨论了一般模型类别,称为动态变异自动变异器(DVAEs),其中包括这些时空VAE扩展的一大部分。然后,我们详细介绍了最近提出的7个DVAE模型,目的是将这些模型与现有的经典时间模型联系起来。我们重新落实了这7个DVAE模型,并介绍了在语音分析模型上进行的实验性基准结果,即称为动态变异自动变换码器(DVAE),其中包括这些时时空扩展期扩展期扩展期的模型。