Weighted finite automata (WFAs) have been widely applied in many fields. One of the classic problems for WFAs is probability distribution estimation over sequences of discrete symbols. Although WFAs have been extended to deal with continuous input data, namely continuous WFAs (CWFAs), it is still unclear how to approximate density functions over sequences of continuous random variables using WFA-based models, due to the limitation on the expressiveness of the model as well as the tractability of approximating density functions via CWFAs. In this paper, we propose a nonlinear extension to the CWFA model to first improve its expressiveness, we refer to it as the nonlinear continuous WFAs (NCWFAs). Then we leverage the so-called RNADE method, which is a well-known density estimator based on neural networks, and propose the RNADE-NCWFA model. The RNADE-NCWFA model computes a density function by design. We show that this model is strictly more expressive than the Gaussian HMM model, which CWFA cannot approximate. Empirically, we conduct a synthetic experiment using Gaussian HMM generated data. We focus on evaluating the model's ability to estimate densities for sequences of varying lengths (longer length than the training data). We observe that our model performs the best among the compared baseline methods.
翻译:在许多领域广泛应用了加权自成一体变量(WFAs),WFAs的典型问题之一是对离散符号序列的概率分布估计。虽然WFAs已经扩展,以处理连续自成一体的输入数据,即连续自成一体的WFAs(CWFAs),但由于模型的清晰度有限,而且通过自成一体的CWFAs的接近密度函数的可移动性也有限,因此仍然不清楚如何利用以WFA为基础的模型(WFAs)对连续随机变量序列的密度函数进行大致估计。在本文中,我们提议对CWFA模型进行非线性扩展,以首先改进它的清晰度,我们称之为非线性连续自成一体的WFAs(CWCFAs),然后我们利用所谓的RNADE方法,这是以神经网络为基础的一个广为人知的密度估计器。RNADE-NCFAFA模型通过设计对密度函数进行一个密度函数。我们表明,这个模型比高估量的HMM模型的长度模型更清晰,而我们用高估测测测测测测测测测了HMMA的模型的数据。