In this paper, we introduce a spectral-domain inverse filtering approach for single-channel speech de-reverberation using deep convolutional neural network (CNN). The main goal is to better handle realistic reverberant conditions where the room impulse response (RIR) filter is longer than the short-time Fourier transform (STFT) analysis window. To this end, we consider the convolutive transfer function (CTF) model for the reverberant speech signal. In the proposed framework, the CNN architecture is trained to directly estimate the inverse filter of the CTF model. Among various choices for the CNN structure, we consider the U-net which consists of a fully-convolutional auto-encoder network with skip-connections. Experimental results show that the proposed method provides better de-reverberation performance than the prevalent benchmark algorithms under various reverberation conditions.
翻译:在本文中,我们采用光谱-面部逆向过滤法,利用深相电动神经网络(CNN),对单通道语音脱动采用光谱-面体过滤法。主要目标是更好地处理现实的反动条件,在这种条件下,室内脉冲反应(RIR)过滤器比短期的Fourier变换(STFT)分析窗口长。为此,我们考虑了回动语音信号的相向传输功能模式。在拟议框架中,CNN架构受到培训,直接估计CTF模型的反向过滤。在CNN结构的各种选择中,我们考虑了由完全同步自动编码网络和跳动连接组成的Unet。实验结果显示,拟议的方法比不同回动条件下的通用基准算法提供更好的脱动性能。