NeuFA:基于神经网络的终端到尾强迫调整和双向关注机制 (NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism)

Although deep learning and end-to-end models have been widely used and shown superiority in automatic speech recognition (ASR) and text-to-speech (TTS) synthesis, state-of-the-art forced alignment (FA) models are still based on hidden Markov model (HMM). HMM has limited view of contextual information and is developed with long pipelines, leading to error accumulation and unsatisfactory performance. Inspired by the capability of attention mechanism in capturing long term contextual information and learning alignments in ASR and TTS, we propose a neural network based end-to-end forced aligner called NeuFA, in which a novel bidirectional attention mechanism plays an essential role. NeuFA integrates the alignment learning of both ASR and TTS tasks in a unified framework by learning bidirectional alignment information from a shared attention matrix in the proposed bidirectional attention mechanism. Alignments are extracted from the learnt attention weights and optimized by the ASR, TTS and FA tasks in a multi-task learning manner. Experimental results demonstrate the effectiveness of our proposed model, with mean absolute error on test set drops from 25.8 ms to 23.7 ms at word level, and from 17.0 ms to 15.7 ms at phoneme level compared with state-of-the-art HMM based model.

翻译：虽然在自动语音识别(ASR)和文本到语音合成(TTS)方面广泛使用和展示了深度学习和端到端的模型,并在自动语音识别(ASR)和文本到语音合成(TFS)方面表现出了优越性,但最新的强制调整(FA)模型仍然基于隐藏的Markov模型(HMM),HMM对背景信息的看法有限,开发的管道很长,导致误差累积和性能不尽人意。由于关注机制能够捕捉ASR和TTS的长期背景信息和学习一致性,因此,我们提议以多功能学习的方式,建立一个基于神经网络的终端到端强制匹配器,其中一个新的双向关注机制发挥着至关重要的作用。NeuFA将ASR和TS(TS)任务的统一学习纳入一个统一的框架中,在拟议的双向关注矩阵中学习双向匹配信息,导致误差累积和不尽性能。在ASR、TTS和FA任务中,以多功能学习方式优化。实验结果显示我们提出的模型的有效性,测试组的绝对错误从25.8米到23米水平,比HMMTM的15米。

相关内容

隐马尔可夫模型

关注 342

隐马尔可夫模型（Hidden Markov Model，HMM）是统计模型，它用来描述一个含有隐含未知参数的马尔可夫过程。其难点是从可观察的参数中确定该过程的隐含参数。然后利用这些参数来作进一步的分析，例如模式识别。其是在被建模的系统被认为是一个马尔可夫过程与未观测到的（隐藏的）的状态的统计马尔可夫模型。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【2020关键词提取】基于深度神经网络的关键词提取，Keywords extraction with deep neural network model

专知会员服务

60+阅读 · 2020年5月2日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日