Despite the recent success of speech separation models, they fail to separate sources properly while facing different sets of people or noisy environments. To tackle this problem, we proposed to apply meta-learning to the speech separation task. We aimed to find a meta-initialization model, which can quickly adapt to new speakers by seeing only one mixture generated by those people. In this paper, we use model-agnostic meta-learning(MAML) algorithm and almost no inner loop(ANIL) algorithm in Conv-TasNet to achieve this goal. The experiment results show that our model can adapt not only to a new set of speakers but also noisy environments. Furthermore, we found out that the encoder and decoder serve as the feature-reuse layers, while the separator is the task-specific module.
翻译:尽管最近语言分离模式取得了成功,但是在面对不同的人群或吵闹的环境时,它们未能适当地区分来源。为了解决这一问题,我们提议在语音分离任务中应用元化学习。我们的目标是寻找一个元初始化模式,通过只看到由这些人群产生的一种混合物,可以迅速适应新的发言者。在本文中,我们使用模型-不可知的元学习算法和Conv-TasNet中几乎没有内部循环算法来实现这一目标。实验结果表明,我们的模型不仅可以适应新的演讲者组合,还可以适应吵闹的环境。此外,我们发现,编码和解码器作为地貌再利用层,而分离器则是任务特定模块。