This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connectionist Temporal Classification (CTC)-like algorithms for automatic speech recognition. Three new CTC variants are proposed: (1) the "compact-CTC", in which direct transitions between units are replaced with <epsilon> back-off transitions; (2) the "minimal-CTC", that only adds <blank> self-loops when used in WFST-composition; and (3) the "selfless-CTC" variants, which disallows self-loop for non-blank units. Compact-CTC allows for 1.5 times smaller WFST decoding graphs and reduces memory consumption by two times when training CTC models with the LF-MMI objective without hurting the recognition accuracy. Minimal-CTC reduces graph size and memory consumption by two and four times for the cost of a small accuracy drop. Using selfless-CTC can improve the accuracy for wide context window models.
翻译:本文介绍了用于自动语音识别的“连接时间分类(CTC)”类算法的新颖的“加权短期国家转换器”(WFST)地形图,其中提出了三种新的CTC变体:(1)“组合-CTC”,其中用<epsilon>后端转换取代单位之间的直接转换;(2)“最小-CTC”,其中仅增加在WFST组合中使用的<blank>自滑块;(3)“无自我-CTC”变体,其中不允许非空白单元的自我循环。常规-CTC允许使用1.5倍较小的WFST解码图形,并在培训LF-MMI目标的CT模型时将记忆消耗减少2倍,但不影响识别精确度。最小-CTC在小精度跌时将图形大小和记忆消耗减少2至4倍。使用自定义-CT可以提高大背景窗口模型的精确度。