Deep neural network based speech enhancement approaches aim to learn a noisy-to-clean transformation using a supervised learning paradigm. However, such a trained-well transformation is vulnerable to unseen noises that are not included in training set. In this work, we focus on the unsupervised noise adaptation problem in speech enhancement, where the ground truth of target domain data is completely unavailable. Specifically, we propose a generative adversarial network based method to efficiently learn a converse clean-to-noisy transformation using a few minutes of unpaired target domain data. Then this transformation is utilized to generate sufficient simulated data for domain adaptation of the enhancement model. Experimental results show that our method effectively mitigates the domain mismatch between training and test sets, and surpasses the best baseline by a large margin.
翻译:基于深神经网络的语音增强方法旨在利用监督的学习模式学习噪音到清洁的变换。 但是,这种经过训练的良性变换很容易受到未包含在培训集中的无形噪音的影响。 在这项工作中,我们侧重于语音增强中不受监督的噪声适应问题,即目标域数据的地面真实性完全不存在。 具体地说,我们提出了一个基于基因的对抗网络方法,以便利用几分钟未受控制的目标域数据,有效地学习一种相反的清洁到噪音变换。 然后,这种变换被用来生成足够的模拟数据,用于增强模型的域适应。 实验结果显示,我们的方法有效地缓解了培训和测试组之间的域错配,并大大超过了最佳基线。