The class of recurrent mixture density networks is an important class of probabilistic models used extensively in sequence modeling and sequence-to-sequence mapping applications. In this class of models, the density of a target sequence in each time-step is modeled by a Gaussian mixture model with the parameters given by a recurrent neural network. In this paper, we generalize recurrent mixture density networks by defining a Gaussian mixture model on a non-linearly transformed target sequence in each time-step. The non-linearly transformed space is created by normalizing flow. We observed that this model significantly improves the fit to image sequences measured by the log-likelihood. We also applied the proposed model on some speech and image data, and observed that the model has significant modeling power outperforming other state-of-the-art methods in terms of the log-likelihood.
翻译:循环混合密度网络是一类重要的概率模型,广泛应用于序列建模和序列到序列映射任务中。在这类模型中,目标序列每个时间步的密度由一个高斯混合模型建模,其参数由循环神经网络给出。在本文中,我们通过在每个时间步上定义非线性变换后的高斯混合模型,将循环混合密度网络进行了泛化。非线性变换空间通过归一化流创建。我们观察到,这种模型显著提高了对图像序列的拟合度(通过对数似然度量)。我们还将所提出的模型应用于一些语音和图像数据,并观察到该模型具有显著的建模能力,在对数似然度方面优于其他最先进的方法。