Recently, speech separation (SS) task has achieved remarkable progress driven by deep learning technique. However, it is still challenging to separate target signals from noisy mixture, as neural model is vulnerable to assign background noise to each speaker. In this paper, we propose a noise-aware SS method called NASS, which aims to improve the speech quality of separated signals in noisy conditions. Specifically, NASS views background noise as an independent speaker and predicts it with other speakers in a mask-based manner. Then we conduct patch-wise contrastive learning on feature level to minimize the mutual information between the predicted noise-speaker and other speakers, which suppresses the noise information in separated signals. The experimental results show that NASS effectively improves the noise-robustness for different mask-based separation backbones with less than 0.1M parameter increase. Furthermore, SI-SNRi results demonstrate that NASS achieves state-of-the-art performance on WHAM! dataset.
翻译:暂无翻译