等效部分和神经传感器建模:概念的证明 (Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept)

With the advent of direct models in automatic speech recognition (ASR), the formerly prevalent frame-wise acoustic modeling based on hidden Markov models (HMM) diversified into a number of modeling architectures like encoder-decoder attention models, transducer models and segmental models (direct HMM). While transducer models stay with a frame-level model definition, segmental models are defined on the level of label segments directly. While (soft-)attention-based models avoid explicit alignment, transducer and segmental approach internally do model alignment, either by segment hypotheses or, more implicitly, by emitting so-called blank symbols. In this work, we prove that the widely used class of RNN-Transducer models and segmental models (direct HMM) are equivalent and therefore show equal modeling power. It is shown that blank probabilities translate into segment length probabilities and vice versa. In addition, we provide initial experiments investigating decoding and beam-pruning, comparing time-synchronous and label-/segment-synchronous search strategies and their properties using the same underlying model.

翻译：在自动语音识别(ASR)中出现了直接模型,这是以前以隐蔽的Markov模型(HMM)为基础、以隐藏的Markov模型(HMM)为基础、以框架为主、以框架为主的声学模型(HMM)为主的流行的声学模型,多样化成若干模型结构,如编码器-编码器注意模型、转导器模型和分区模型(直接HMM)。在采用框架级模型定义的同时,分层模型直接在标签部分的水平上定义。(软)基于注意的模型避免明确的对齐、传感器和分区方法在内部通过部分假设或更隐含地通过发布所谓的空白符号对模型进行对齐。在这项工作中,我们证明广泛使用的RNNN-传感器模型和分区模型(直接HMMM)的类别是等效的,因此显示了同等的建模能力。它表明空白的概率转化为分段的概率和反差。此外,我们提供初步的实验,调查分解和分解和分流,比较时间和标签-同步同步搜索战略及其特性,并用同一基本模型加以比较。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/