关于对代话人至代话人非自动递减模式的比较研究 (A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation)

Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines. Showing great potential for real-time applications, an increasing number of NAR models have been explored in different fields to mitigate the performance gap against AR models. In this work, we conduct a comparative study of various NAR modeling methods for end-to-end automatic speech recognition (ASR). Experiments are performed in the state-of-the-art setting using ESPnet. The results on various tasks provide interesting findings for developing an understanding of NAR ASR, such as the accuracy-speed trade-off and robustness against long-form utterances. We also show that the techniques can be combined for further improvement and applied to NAR end-to-end speech translation. All the implementations are publicly available to encourage further research in NAR speech processing.

翻译：与自动递减基线相比,非航空(NAR)模型同时产生一个序列的多重产出,大大降低精确率下降成本的推论速度,显示出实时应用的巨大潜力,在不同领域探索了越来越多的NAR模型,以缩小AR模型的性能差距;在这项工作中,我们对端到端自动语音识别的各种NAR模型方法进行了比较研究;在使用ESPnet的最先进的环境下进行了实验;各项任务的结果为了解NARARAR语音处理提供了有趣的结果,例如精确速度交易和对长式语音的稳健性;我们还表明,这些技术可以结合起来进一步改进,并应用于NAR端到端语音翻译;所有实施方法都可供公开使用,以鼓励对NAR语音处理进行进一步研究。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/