Most of the deep learning-based speech enhancement models are learned in a supervised manner, which implies that pairs of noisy and clean speech are required during training. Consequently, several noisy speeches recorded in daily life cannot be used to train the model. Although certain unsupervised learning frameworks have also been proposed to solve the pair constraint, they still require clean speech or noise for training. Therefore, in this paper, we propose MetricGAN-U, which stands for MetricGAN-unsupervised, to further release the constraint from conventional unsupervised learning. In MetricGAN-U, only noisy speech is required to train the model by optimizing non-intrusive speech quality metrics. The experimental results verified that MetricGAN-U outperforms baselines in both objective and subjective metrics.
翻译:大部分深层次的基于学习的语音强化模式都是以监督方式学习的,这意味着在培训期间需要一对吵闹和清洁的言语,因此,在日常生活中记录的一些吵闹的言语不能用来训练模式。虽然也提议了一些未经监督的学习框架来解决对口限制,但是它们仍然需要清洁的言语或噪音来进行培训。因此,在本文件中,我们提议MetriGAN-U(代表MetriGAN-无人监督的)的MetriGAN-U(MetriGAN-U)进一步解除传统不受监督的学习的制约。在MetriGAN-U(MetriGAN-U)中,只需要吵闹的言语就可以通过优化非侵入性言语质量衡量标准来训练模式。实验结果证实MetriGAN-U(MetriGAN-U)在客观和主观衡量标准上都比基线都强。