This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connectionist Temporal Classification (CTC)-like algorithms for automatic speech recognition. Three new CTC variants are proposed: (1) the "compact-CTC", in which direct transitions between units are replaced with <epsilon> back-off transitions; (2) the "minimal-CTC", that only adds <blank> self-loops when used in WFST-composition; and (3) "selfless-CTC", that disallows self-loop for non-blank units. The new CTC variants have several benefits, such as reducing decoding graph size and GPU memory required for training while keeping model accuracy.
翻译:本文介绍了实施连接时间分类(CTC)类似算法以自动语音识别的新颖的“重力有限国家转换器(WFST)”表层,提出了三个新的CTC变体:(1)“Compact-CTC”,用<epsilon>后端转换取代各单元之间的直接转换;(2)“Minmal-CTC”,在WFST组合中只增加“blank”自滑体;(3)“没有自我的CTC”,不允许非blank单元的自我循环。新的CTC变体有若干好处,例如减少解码图形大小和训练所需的GPU记忆,同时保持模型的准确性。