In real scenarios, it is often necessary and significant to control the inference speed of speech enhancement systems under different conditions. To this end, we propose a stage-wise adaptive inference approach with early exit mechanism for progressive speech enhancement. Specifically, in each stage, once the spectral distance between adjacent stages lowers the empirically preset threshold, the inference will terminate and output the estimation, which can effectively accelerate the inference speed. To further improve the performance of existing speech enhancement systems, PL-CRN++ is proposed, which is an improved version over our preliminary work PL-CRN and combines stage recurrent mechanism and complex spectral mapping. Extensive experiments are conducted on the TIMIT corpus, the results demonstrate the superiority of our system over state-of-the-art baselines in terms of PESQ, ESTOI and DNSMOS. Moreover, by adjusting the threshold, we can easily control the inference efficiency while sustaining the system performance.
翻译:在实际情况下,控制在不同条件下增强语音系统的推论速度往往是必要和重要的。为此目的,我们提议采用分阶段的适应性推论方法,并采用早期退出机制,逐步增强语音。具体地说,在每一阶段,当相邻阶段之间的光谱距离降低经验预设阈值时,推论将终止和输出估计,从而有效地加快推论速度。提议PL-CRN++,以进一步改进现有增强语音系统的性能,这是改进我们初步工作PL-CRN的版本,并结合了阶段性经常性机制和复杂的光谱绘制。在TIMITCamp上进行了广泛的实验,结果表明我们的系统在PESQ、ESTOI和DNSMOS方面优于最先进的基线。此外,通过调整阈值,我们可以很容易控制推论效率,同时维持系统的性能。