Deep neural network (DNN) based speech enhancement models have attracted extensive attention due to their promising performance. However, it is difficult to deploy a powerful DNN in real-time applications because of its high computational cost. Typical compression methods such as pruning and quantization do not make good use of the data characteristics. In this paper, we introduce the Skip-RNN strategy into speech enhancement models with parallel RNNs. The states of the RNNs update intermittently without interrupting the update of the output mask, which leads to significant reduction of computational load without evident audio artifacts. To better leverage the difference between the voice and the noise, we further regularize the skipping strategy with voice activity detection (VAD) guidance, saving more computational load. Experiments on a high-performance speech enhancement model, dual-path convolutional recurrent network (DPCRN), show the superiority of our strategy over strategies like network pruning or directly training a smaller model. We also validate the generalization of the proposed strategy on two other competitive speech enhancement models.
翻译:深神经网络(DNN)基于深神经网络的语音增强模型因其有希望的性能而吸引了广泛的关注。然而,由于计算成本高,很难在实时应用中部署强大的 DNN, 因为它的计算成本高。典型的压缩方法,如剪裁和量化,不能很好地利用数据特征。在本文中,我们将Spp-RNN战略引入与平行的 RNNS 的语音增强模型。RN的状态在不中断更新输出面罩的情况下不时更新,这导致计算负荷的大幅削减,而没有明显的音频制品。为了更好地利用声音和噪音之间的差别,我们进一步将跳过战略规范为语音活动检测(VAD)指南,节省更多的计算负荷。关于高性能语音增强模型的实验,双向共振动经常性网络(DPCRN),显示了我们战略优于网络剪裁或直接培训一个较小模型的战略。我们还验证了拟议战略在另外两个竞争性语音增强模型上的总体性。