Mixture-of-experts based acoustic models with dynamic routing mechanisms have proved promising results for speech recognition. The design principle of router architecture is important for the large model capacity and high computational efficiency. Our previous work SpeechMoE only uses local grapheme embedding to help routers to make route decisions. To further improve speech recognition performance against varying domains and accents, we propose a new router architecture which integrates additional global domain and accent embedding into router input to promote adaptability. Experimental results show that the proposed SpeechMoE2 can achieve lower character error rate (CER) with comparable parameters than SpeechMoE on both multi-domain and multi-accent task. Primarily, the proposed method provides up to 1.6% - 4.8% relative CER improvement for the multidomain task and 1.9% - 17.7% relative CER improvement for the multi-accent task respectively. Besides, increasing the number of experts also achieves consistent performance improvement and keeps the computational cost constant.
翻译:借助动态路由机制的基于专家的混合声学模型和动态路由机制的语音模型已证明在语音识别方面取得了大有希望的结果。 路由器结构的设计原则对于大型模型容量和高计算效率都很重要。 我们先前的工作SpealesMoE只使用本地图形嵌入来帮助路由器做出路由决定。 为了进一步改进不同域和口音的语音识别性能,我们提议一个新的路由器结构,将更多的全球域和口音嵌入路由器输入以促进适应性。 实验结果显示,拟议的SpeeMoE 2 能够达到比SpeeMoE 多域和多中心任务具有可比参数的性格错误率(CER ) 。 首要的是,拟议方法为多域任务提供了高达1.6-4.8%的相对CER改进率,为多中心任务提供了1.9%-17.7%的相对CER改进率。此外,增加的专家数量还能实现一致的性能改进并保持计算成本不变。