We introduce two novel unsupervised (blind) source separation methods, which involve self-supervised training from single-channel two-source speech mixtures without any access to the ground truth source signals. Our first method employs permutation invariant training (PIT) to separate artificially-generated mixtures of the original mixtures back into the original mixtures, which we named mixture permutation invariant training (MixPIT). We found this challenging objective to be a valid proxy task for learning to separate the underlying sources. We improve upon this first method by creating mixtures of source estimates and employing PIT to separate these new mixtures in a cyclic fashion. We named this second method cyclic mixture permutation invariant training (MixCycle), where cyclic refers to the fact that we use the same model to produce artificial mixtures and to learn from them continuously. We show that MixPIT outperforms a common baseline (MixIT) on our small dataset (SC09Mix), and they have comparable performance on a standard dataset (LibriMix). Strikingly, we also show that MixCycle surpasses the performance of supervised PIT by being data-efficient, thanks to its inherent data augmentation mechanism. To the best of our knowledge, no other purely unsupervised method is able to match or exceed the performance of supervised training.
翻译:我们引入了两种不受监督的(盲)源分离方法,其中涉及从单一渠道双源语音混合物进行自我监督培训,而无需接触地面的真相源信号。我们的第一个方法使用变异性培训(PIT)将原混合物的人工生成混合物重新分离回原混合物,我们称之为混合物变异性培训(MixPIT)。我们发现,这个具有挑战性的目标是一个有效的替代任务,用于学习分离源。我们改进了第一种方法,创建了源估计混合物,并使用PIT将这些新混合物以循环方式分离。我们命名了第二种方法的周期性混合变异性培训(MixCycle),循环性能培训指的是我们使用同样的模型生产人工混合物并不断学习。我们发现,MixPITIT在小数据集(SC09Mixix)上比一个共同的基线(MixIT)要强得多,并且它们具有在标准数据集(LibriMix)上可比的性能。我们用这个方法命名了第二个方法的周期性混合混合混合混合混合混合混合混合,我们用它指我们使用同一模型生成的内在性能数据,我们通过对精度数据进行监管性的数据进行不力分析,我们通过精度的强化的强化的强化性能数据进行不超越了。我们对精度数据。我们的数据。我们通过对精度数据系统进行不力的强化性能的强化性能的强化性能的强化性能数据。我们也证明了性能数据。我们展示了。