关于移动电话的终端至终端语音到文本示范培训 (Training end-to-end speech-to-text models on mobile phones)

Training the state-of-the-art speech-to-text (STT) models in mobile devices is challenging due to its limited resources relative to a server environment. In addition, these models are trained on generic datasets that are not exhaustive in capturing user-specific characteristics. Recently, on-device personalization techniques have been making strides in mitigating the problem. Although many current works have already explored the effectiveness of on-device personalization, the majority of their findings are limited to simulation settings or a specific smartphone. In this paper, we develop and provide a detailed explanation of our framework to train end-to-end models in mobile phones. To make it simple, we considered a model based on connectionist temporal classification (CTC) loss. We evaluated the framework on various mobile phones from different brands and reported the results. We provide enough evidence that fine-tuning the models and choosing the right hyperparameter values is a trade-off between the lowest WER achievable, training time on-device, and memory consumption. Hence, this is vital for a successful deployment of on-device training onto a resource-limited environment like mobile phones. We use training sets from speakers with different accents and record a 7.6% decrease in average word error rate (WER). We also report the associated computational cost measurements with respect to time, memory usage, and cpu utilization in mobile phones in real-time.

翻译：在移动设备中培训最先进的语音到文字模型(STT)因其相对于服务器环境的资源有限而具有挑战性。此外,这些模型在通用数据集方面受过培训,这些数据集在捕捉用户特性方面并非详尽无遗。最近,在线个人化技术在缓解这一问题方面迈出了一大步。虽然许多当前工作已经探索了在设备上个人化的有效性,但其大部分发现都局限于模拟设置或特定智能手机。因此,我们制定并详细解释我们培训移动电话端到端模型的框架。为了简单化,我们考虑了基于连接时间分类(CTC)损失的模型。我们评估了不同品牌的各种移动电话框架并报告了结果。我们提供了足够证据,表明微调模型和选择正确的超光谱值值是最低WER可实现率、在设备上培训时间和记忆消耗之间的利弊。因此,为了成功地在像移动电话这样的资源有限环境中部署在线培训模式至关重要。为了简单化,我们考虑了基于连接时间分类(CT)损失的模型。我们评估了不同品牌的各种移动电话使用率,并报告了结果。我们用不同的存储器中所使用的存储率和存储率。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【杜克-Bhuwan Dhingra】语言模型即知识图谱，46页ppt

专知会员服务

67+阅读 · 2021年11月15日

【UAI2021教程】贝叶斯最优学习，65页ppt

专知会员服务

65+阅读 · 2021年8月7日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日