The recurrent neural networks (RNN) with richly distributed internal states and flexible non-linear transition functions, have overtaken the dynamic Bayesian networks such as the hidden Markov models (HMMs) in the task of modeling highly structured sequential data. These data, such as from speech and handwriting, often contain complex relationships between the underlaying variational factors and the observed data. The standard RNN model has very limited randomness or variability in its structure, coming from the output conditional probability model. This paper will present different ways of using high level latent random variables in RNN to model the variability in the sequential data, and the training method of such RNN model under the VAE (Variational Autoencoder) principle. We will explore possible ways of using adversarial method to train a variational RNN model. Contrary to competing approaches, our approach has theoretical optimum in the model training and provides better model training stability. Our approach also improves the posterior approximation in the variational inference network by a separated adversarial training step. Numerical results simulated from TIMIT speech data show that reconstruction loss and evidence lower bound converge to the same level and adversarial training loss converges to 0.
翻译:经常的神经网络(RNN)具有分布广泛的内部状态和灵活的非线性过渡功能的经常性神经网络(RNN)已经超过了动态的Bayesian网络,如隐藏的Markov模型(HMMs)等隐藏的Markov模型(HMMs),以模拟结构性强的相继数据。这些数据,例如来自言语和笔迹的数据,往往包含下铺变异因素与观察到的数据之间的复杂关系。标准的RNN模型在结构上具有非常有限的随机性或变异性,来自输出条件概率模型。本文将介绍使用RNNN的高潜潜伏随机变量的不同方法,以模拟相继数据的变异性,以及根据VAE(Variational Autogencoder)原则这种RNNM模型的培训方法。我们将探索使用对抗性方法来训练变异性RNNN模式的可能方法。与相互竞争的方法相反,我们的方法在模型培训中具有理论上的优化性,并提供更好的示范性培训稳定性。我们的方法还将通过分离的对抗性训练步骤改进变价网络的后更近近近。根据TIMIT语音数据模拟模拟模拟模拟的结果显示,重建损失和证据更接近于同一水平。