While there have been several contributions exploring state of the art techniques for text normalization, the problem of inverse text normalization (ITN) remains relatively unexplored. The best known approaches leverage finite state transducer (FST) based models which rely on manually curated rules and are hence not scalable. We propose an efficient and robust neural solution for ITN leveraging transformer based seq2seq models and FST-based text normalization techniques for data preparation. We show that this can be easily extended to other languages without the need for a linguistic expert to manually curate them. We then present a hybrid framework for integrating Neural ITN with an FST to overcome common recoverable errors in production environments. Our empirical evaluations show that the proposed solution minimizes incorrect perturbations (insertions, deletions and substitutions) to ASR output and maintains high quality even on out of domain data. A transformer based model infused with pretraining consistently achieves a lower WER across several datasets and is able to outperform baselines on English, Spanish, German and Italian datasets.
翻译:虽然已经作出了一些贡献,探索了文本正常化的先进技术,反正文本正常化问题仍然相对没有探讨。最已知的方法利用了依赖人工整理规则因而无法伸缩的基于有限国家传感器的模型。我们建议为ITN利用基于变压器的后继2Seq模型和基于FST的文本正常化技术来编制数据提供一个高效而有力的神经解决方案。我们表明,这很容易推广到其他语言,不需要语言专家手工翻译。我们随后提出了一个将神经互联网与FST相结合的混合框架,以克服生产环境中常见的可回收错误。我们的经验评估表明,拟议的解决方案将不正确的扰动(插入、删除和替换)减到ASR输出,并保持高品质,甚至保持在域外数据上。一个基于变压器的模型在培训前始终在几个数据集中达到一个较低的WER,并且能够超越英文、西班牙文、德文和意大利文数据集的基线。