It is highly desirable that speech enhancement algorithms can achieve good performance while keeping low latency for many applications, such as digital hearing aids, acoustically transparent hearing devices, and public address systems. To improve the performance of traditional low-latency speech enhancement algorithms, a deep filter-bank equalizer (FBE) framework was proposed, which integrated a deep learning-based subband noise reduction network with a deep learning-based shortened digital filter mapping network. In the first network, a deep learning model was trained with a controllable small frame shift to satisfy the low-latency demand, i.e., $\le$ 4 ms, so as to obtain (complex) subband gains, which could be regarded as an adaptive digital filter in each frame. In the second network, to reduce the latency, this adaptive digital filter was implicitly shortened by a deep learning-based framework, and was then applied to noisy speech to reconstruct the enhanced speech without the overlap-add method. Experimental results on the WSJ0-SI84 corpus indicated that the proposed deep FBE with only 4-ms latency achieved much better performance than traditional low-latency speech enhancement algorithms in terms of the indices such as PESQ, STOI, and the amount of noise reduction.
翻译:为改进传统低纬度语音增强算法的性能,提出了深过滤银行平衡器框架,将深学习型子波段减少噪音网络与深学习型缩短的数字过滤器绘图网结合起来。在第一个网络中,深学习模式经过培训,进行了可控的小框架转换,以满足低延度需求,即4 ms,从而获得(complex)子带增益,这可被视为每个框架的适应性数字过滤器。在第二个网络中,为降低悬浮度,这一适应性数字过滤器被深学习型框架暗地缩短,然后用于噪音发言,以重建强化的语音,而不用重叠加法。WSJ0-SI84Conporation的实验结果显示,拟议的深频带宽度只有4 ms,仅实现了4ms的宽度,而降低传统高温度的语音分析器值比降低低温度的低温室变压法要好得多。