Continued improvements in machine learning techniques offer exciting new opportunities through the use of larger models and larger training datasets. However, there is a growing need to offer these new capabilities on-board low-powered devices such as smartphones, wearables and other embedded environments where only low memory is available. Towards this, we consider methods to reduce the model size of Conformer-based speech recognition models which typically require models with greater than 100M parameters down to just $5$M parameters while minimizing impact on model quality. Such a model allows us to achieve always-on ambient speech recognition on edge devices with low-memory neural processors. We propose model weight reuse at different levels within our model architecture: (i) repeating full conformer block layers, (ii) sharing specific conformer modules across layers, (iii) sharing sub-components per conformer module, and (iv) sharing decomposed sub-component weights after low-rank decomposition. By sharing weights at different levels of our model, we can retain the full model in-memory while increasing the number of virtual transformations applied to the input. Through a series of ablation studies and evaluations, we find that with weight sharing and a low-rank architecture, we can achieve a WER of 2.84 and 2.94 for Librispeech dev-clean and test-clean respectively with a $5$M parameter model.
翻译:机器学习技术的持续改进通过使用更大的模型和更大的培训数据集提供了令人振奋的新机会。然而,越来越需要提供机载低功率装置,如智能手机、磨损器和其他内嵌环境,只有低内存的智能手机、可磨损器和其他内嵌环境等机载低能装置的这些新能力。为此,我们考虑采用一些方法,将基于内存的语音识别模型模型的模型缩小模型规模,这些模型通常需要超过100M参数的模型,降至仅500M美元参数,同时尽量减少对模型质量的影响。这样的模型使我们能够在使用低微神经处理器的边缘设备上始终在环境语音识别。我们提议在模型结构的不同级别上重新利用模型重量:(一) 重复完全合规的区块层,(二) 共享各层的具体合规模块,(三) 共享每个相容模块的子组件,(四) 共享低分解的子构件重量,在低分解后共享。通过在模型的不同级别共享权重,我们可以保留完整的模型,同时增加对投入的虚拟转换数量。我们通过一系列的精度和精度研究,可以分别实现2.94和测试结构的分级,我们通过分级的分级研究和分级的分级的分级研究和分级,可以发现,我们可以找到的分级的分级研究和分级评估。</s>