From wearables to powerful smart devices, modern automatic speech recognition (ASR) models run on a variety of edge devices with different computational budgets. To navigate the Pareto front of model accuracy vs model size, researchers are trapped in a dilemma of optimizing model accuracy by training and fine-tuning models for each individual edge device while keeping the training GPU-hours tractable. In this paper, we propose Omni-sparsity DNN, where a single neural network can be pruned to generate optimized model for a large range of model sizes. We develop training strategies for Omni-sparsity DNN that allows it to find models along the Pareto front of word-error-rate (WER) vs model size while keeping the training GPU-hours to no more than that of training one singular model. We demonstrate the Omni-sparsity DNN with streaming E2E ASR models. Our results show great saving on training time and resources with similar or better accuracy on LibriSpeech compared to individually pruned sparse models: 2%-6.6% better WER on Test-other.
翻译:从磨损设备到强大的智能设备,现代自动语音识别(ASR)模型运行在各种边缘设备上,计算预算各不相同。为了浏览Pareto模型精确度相对于模型大小的模型,研究人员陷入了通过培训和微调模型优化模型精确度的两难境地,而每个单个边缘设备则保持培训GPU-小时的可移动性。在本文中,我们提议Omni-parity DNN, 可以在其中运行一个单一神经网络, 以生成各种模型大小的优化模型。 我们为Omni- sparity DNN制定培训战略, 使其能够在Pareto单词速率(WER)相对于模型大小的前沿找到模型,同时将培训GPU-小时保持在不超过培训的单一模型。 我们用 E2E ASR 模型演示Omni- sparity DNNN。 我们的结果显示,在LiSpeech 上的培训时间和资源方面节省了大量类似或更高的精度。