Keyword spotting (KWS) on mobile devices generally requires a small memory footprint. However, most current models still maintain a large number of parameters in order to ensure good performance. To solve this problem, this paper proposes a separable temporal convolution neural network with attention, it has a small number of parameters. Through the time convolution combined with attention mechanism, a small number of parameters model (32.2K) is implemented while maintaining high performance. The proposed model achieves 95.7% accuracy on the Google Speech Commands dataset, which is close to the performance of Res15(239K), the state-of-the-art model in KWS at present.
翻译:在移动设备上显示关键字(KWS)通常需要少量的记忆足迹。 然而,大多数当前模型仍然保留着大量参数,以确保良好的性能。 为解决这一问题,本文件提出一个关注的、可分离的时共神经网络,它有少量参数。随着时间的变迁,在保持高性能的同时,还实施了少量参数模型(32.2K)。提议的模型在谷歌语音指令数据集中实现了95.7%的精确度,该数据集接近目前KWS中最先进的模型Res15(239K)的性能。