Inverse text normalization (ITN) converts spoken-domain automatic speech recognition (ASR) output into written-domain text to improve the readability of the ASR output. Many state-of-the-art ITN systems use hand-written weighted finite-state transducer(WFST) grammars since this task has extremely low tolerance to unrecoverable errors. We introduce an open-source Python WFST-based library for ITN which enables a seamless path from development to production. We describe the specification of ITN grammar rules for English, but the library can be adapted for other languages. It can also be used for written-to-spoken text normalization. We evaluate the NeMo ITN library using a modified version of the Google Text normalization dataset.
翻译:反正文本正常化( ITN) 将口头主页自动语音识别( ASR) 输出转换为书面主页文本, 以提高 ASR 输出的可读性。 许多最先进的 ITN 系统使用手写加权定调器语法马( WFST), 因为此项任务对无法收回的错误的容忍度极低。 我们为 ITN 引入了一个基于 开放源的 Python WFST 图书馆, 使从开发到制作的路径能够畅通无阻。 我们描述了 ITN 语法规则的英文规格, 但该图书馆可以调整为其他语言。 也可以用于书面对语言的文本正常化。 我们使用谷歌文本正常化数据集的修改版本来评估 Nemo ITN 图书馆 。