We present RemixIT, a simple yet effective self-supervised method for training speech enhancement without the need of a single isolated in-domain speech nor a noise waveform. Our approach overcomes limitations of previous methods which make them dependent on clean in-domain target signals and thus, sensitive to any domain mismatch between train and test samples. RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures. Then, by permuting the estimated clean and noise signals and remixing them together, we generate a new set of bootstrapped mixtures and corresponding pseudo-targets which are used to train the student network. Vice-versa, the teacher periodically refines its estimates using the updated parameters of the latest student models. Experimental results on multiple speech enhancement datasets and tasks not only show the superiority of our method over prior approaches but also showcase that RemixIT can be combined with any separation model as well as be applied towards any semi-supervised and unsupervised domain adaptation task. Our analysis, paired with empirical evidence, sheds light on the inside functioning of our self-training scheme wherein the student model keeps obtaining better performance while observing severely degraded pseudo-targets.
翻译:我们提出RemixIT,这是一个简单而有效的自我监督的提高语言能力培训方法,不需要单一孤立的现场言语或噪音波形。我们的方法克服了以往方法的局限性,这些方法使它们依赖清洁的内地目标信号,因此对火车和测试样品之间任何领域不匹配敏感。RemixIT基于一个连续的自我培训计划,在这种计划中,一个经过预先训练的关于校外数据推断的教师模型不仅显示我们的方法优于先前的方法,而且表明RemixIT可以与任何分离模型相结合,同时用于任何半监督和不严密的信号和重新组合,我们制作了一套新的靴式混合物和相应的假目标,用于培训学生网络。副校方,教师定期利用最新学生模型的最新参数来改进其估计。多处语音增强数据集和任务的实验结果不仅显示我们的方法优于先前的方法,而且显示,再将RemixIT与任何分离模型结合起来,并用于任何半监督和不严密的域域内调整计划。我们的分析,在进行自我改造的过程中,我们对自己的模型进行更好的分析,同时进行更精确地进行自我分析,同时进行自我评估,同时进行自我训练。