We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that integrates full- and sub-band (FSB) modeling, for single- and multi-channel speech enhancement in the short-time Fourier transform (STFT) domain. The model maintains an information highway to flow an over-complete input representation through multiple FSB-LSTM modules. Each FSB-LSTM module consists of a full-band block to model spectro-temporal patterns at all frequencies and a sub-band block to model patterns within each sub-band, where each of the two blocks takes a down-sampled representation as input and returns an up-sampled discriminative representation to be added to the block input via a residual connection. The model is designed to have a low algorithmic complexity, a small run-time buffer and a very low algorithmic latency, at the same time producing a strong enhancement performance on a noisy-reverberant speech enhancement task even if the hop size is as low as $2$ ms.
翻译:我们提出了FSB-LSTM,一种新颖的基于LSTM的体系结构,用于短时傅里叶变换(STFT)域中的单通道和多通道语音增强,它集成了完整和子带(FSB)建模。该模型维护了一条信息高速公路,通过多个FSB-LSTM模块流过一个过完备输入表示。每个FSB-LSTM模块包括一个完全带块,用于模拟所有频率的频谱 - 时间模式,以及一个子带块,用于模拟每个子带内部的模式,其中每个块采用下采样表示作为输入,并通过残差连接返回上采样的判别式表示,以添加到块输入中。该模型的设计具有较低的算法复杂度、小的运行时缓冲区和非常低的算法延迟,即使跳跃大小仅为2毫秒,也能在一个嘈杂的混响语音增强任务上产生强大的增强性能。