We introduce two unsupervised source separation methods, which involve self-supervised training from single-channel two-source speech mixtures. Our first method, mixture permutation invariant training (MixPIT), enables learning a neural network model which separates the underlying sources via a challenging proxy task without supervision from the reference sources. Our second method, cyclic mixture permutation invariant training (MixCycle), uses MixPIT as a building block in a cyclic fashion for continuous learning. MixCycle gradually converts the problem from separating mixtures of mixtures into separating single mixtures. We compare our methods to common supervised and unsupervised baselines: permutation invariant training with dynamic mixing (PIT-DM) and mixture invariant training (MixIT). We show that MixCycle outperforms MixIT and reaches a performance level very close to the supervised baseline (PIT-DM) while circumventing the over-separation issue of MixIT. Also, we propose a self-evaluation technique inspired by MixCycle that estimates model performance without utilizing any reference sources. We show that it yields results consistent with an evaluation on reference sources (LibriMix) and also with an informal listening test conducted on a real-life mixtures dataset (REAL-M).
翻译:我们引入了两种不受监督的源分离方法,其中涉及从单通道双源言混合混合物进行自我监督的培训。我们的第一个方法,即混合变异性培训(MixPIT),使我们得以学习一个神经网络模型,通过具有挑战性的代用任务,通过具有挑战性的代用任务,而没有参考来源的监督,将基本来源分离出来。我们的第二种方法,即循环混合变异性培训(MixCycle),利用MixPIT作为循环学习的构件,同时绕过MixIT的过度分离问题。此外,我们提出一种由MixCycle所启发的自我评价技术,用以在不使用任何参考来源的情况下估算模型的动态混合(PIT-DM)和混合变异性培训(MixIT)。我们证明,混合混合混合将混合混合混合变异性培训(MixPIT-DM)形成一个非常接近监督的基线(PIT-DM),同时避免MixIT的过度分离问题。此外,我们提议一种由Mix-REcle 所启发的自我评估技术,用以评估模型的业绩,而没有使用任何参考来源。我们用一个连续的数据测试结果显示。