FullSubNet has shown its promising performance on speech enhancement by utilizing both fullband and subband information. However, the relationship between fullband and subband in FullSubNet is achieved by simply concatenating the output of fullband model and subband units. It only supplements the subband units with a small quantity of global information and has not considered the interaction between fullband and subband. This paper proposes a fullband-subband cross-attention (FSCA) module to interactively fuse the global and local information and applies it to FullSubNet. This new framework is called as FS-CANet. Moreover, different from FullSubNet, the proposed FS-CANet optimize the fullband extractor by temporal convolutional network (TCN) blocks to further reduce the model size. Experimental results on DNS Challenge - Interspeech 2021 dataset show that the proposed FS-CANet outperforms other state-of-the-art speech enhancement approaches, and demonstrate the effectiveness of fullband-subband cross-attention.
翻译:全SubNet通过使用全频和子频带信息,展示了全频和子频带在增强语音方面的有希望的性能。然而,全子网中全频和子频段之间的关系仅通过吸收全频和子频段单位的产出来实现。它只是以少量的全球信息补充子频段单位,而没有考虑全频和子频段之间的互动。本文建议采用全频次频段交叉关注模块,将全球和地方信息互动结合起来,并将其应用到全子网。这个新框架被称为FS-CANet。此外,与FullSubNet不同的是,拟议的FS-CANet优化了时共振网络区块的全频带提取器,以进一步缩小模式规模。DNS挑战的实验结果 — Interspeech 2021数据集显示,拟议的FS-CANet超越了其他最先进的语音强化方法,并展示了全频段跨频段互接的有效性。