Speech separation has been well developed, with the very successful permutation invariant training (PIT) approach, although the frequent label assignment switching happening during PIT training remains to be a problem when better convergence speed and achievable performance are desired. In this paper, we propose to perform self-supervised pre-training to stabilize the label assignment in training the speech separation model. Experiments over several types of self-supervised approaches, several typical speech separation models and two different datasets showed that very good improvements are achievable if a proper self-supervised approach is chosen.
翻译:语音分离已经得到很好的发展,采用非常成功的变式培训方法,尽管在PIT培训期间经常出现的标签派任转换在PIT培训期间仍是一个问题,因为需要更快的趋同速度和可实现的绩效。 在本文中,我们提议进行自我监督的预先培训,以稳定语言分离模式培训中的标签派任。 在若干类型的自我监督方法上进行的实验、几个典型的语音隔离模型和两个不同的数据集表明,如果选择适当的自我监督方法,那么可以实现非常良好的改进。