Frame-online speech enhancement systems in the short-time Fourier transform (STFT) domain usually have an algorithmic latency equal to the window size due to the use of the overlap-add algorithm in the inverse STFT (iSTFT). This algorithmic latency allows the enhancement models to leverage future contextual information up to a length equal to the window size. However, current frame-online systems only partially leverage this future information. To fully exploit this information, this study proposes an overlapped-frame prediction technique for deep learning based frame-online speech enhancement, where at each frame our deep neural network (DNN) predicts the current and several past frames that are necessary for overlap-add, instead of only predicting the current frame. In addition, we propose a novel loss function to account for the scale difference between predicted and oracle target signals. Evaluations results on a noisy-reverberant speech enhancement task show the effectiveness of the proposed algorithms.
翻译:短时 Fourier 变换域( STFT) 的框架中线语音增强系统通常具有与窗口大小相等的算法时长。 这种算法时长使增强模型能够利用与窗口大小相等的长度的未来背景信息。 然而, 当前框架线上系统只能部分利用这一未来信息。 为了充分利用这一信息, 本研究提议了一种基于深学习的基于框架的线上语音增强的重叠框架预测技术, 在每个框架里, 我们的深神经网络( DNN) 预测的是当前和过去若干框架, 而这些框架是重叠添加所必需的, 而不是仅仅预测当前框架。 此外, 我们提议了一个新的损失功能, 以计算预测和或手电图目标信号之间的比例差异。 噪音反响语音增强任务的评估结果显示了拟议算法的有效性 。