In multichannel speech enhancement, both spectral and spatial information are vital for discriminating between speech and noise. How to fully exploit these two types of information and their temporal dynamics remains an interesting research problem. As a solution to this problem, this paper proposes a multi-cue fusion network named McNet, which cascades four modules to respectively exploit the full-band spatial, narrow-band spatial, sub-band spectral, and full-band spectral information. Experiments show that each module in the proposed network has its unique contribution and, as a whole, notably outperforms other state-of-the-art methods.
翻译:在多通道语音增强方面,光谱和空间信息对于区分言语和噪音至关重要。如何充分利用这两类信息及其时间动态仍然是一个有趣的研究问题。作为解决这一问题的一个办法,本文件建议建立一个名为McNet的多曲线聚变网络,该网络将四个模块相联,分别用于利用全带空间、窄带空间、亚频带光谱和全频带光谱信息。 实验显示,拟议网络中的每个模块都有其独特的贡献,总体而言,特别优于其他最先进的方法。