The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, input data vectors are processed independently. In recent years, a series of papers have presented different extensions of the VAE to process sequential data, that not only model the latent space, but also model the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks or state space models. In this paper we perform an extensive literature review of these models. Importantly, we introduce and discuss a general class of models called Dynamical Variational Autoencoders (DVAEs) that encompasses a large subset of these temporal VAE extensions. Then we present in detail seven different instances of DVAE that were recently proposed in the literature, with an effort to homogenize the notations and presentation lines, as well as to relate these models with existing classical temporal models. We reimplemented those seven DVAE models and we present the results of an experimental benchmark conducted on the speech analysis-resynthesis task (the PyTorch code is made publicly available). The paper is concluded with an extensive discussion on important issues concerning the DVAE class of models and future research guidelines.
翻译:挥发自动编码器(VAE)是一个强大的深层遗传模型,目前广泛用于通过一个以不受监督的方式学习的低维潜层空间代表高维复杂数据。在最初的 VAE 模型中,输入数据矢量是独立处理的。近年来,一系列论文展示了VAE的不同扩展范围,用于处理连续数据,不仅模拟潜伏空间,而且模拟数据矢量序列和相应潜在矢量的时际依赖性,依靠经常性神经网络或国家空间模型。在本文中,我们对这些模型进行了广泛的文献审查。重要的是,我们介绍和讨论一个名为动态变动自动编码器(DVAE)的一般模型类别,其中包括这些时空扩展的一大部分。然后我们详细介绍了最近在文献中提议的DVAE的七种不同实例,努力将辨别和演示线同化,并将这些模型与现有的典型时间模型联系起来。我们重新落实了这七种DVAE模型的文献模型,我们提出了关于未来纸质分析的重要分析结果,我们完成了关于纸质分析的实验性分析。