We propose speaker separation using speaker inventories and estimated speech (SSUSIES), a framework leveraging speaker profiles and estimated speech for speaker separation. SSUSIES contains two methods, speaker separation using speaker inventories (SSUSI) and speaker separation using estimated speech (SSUES). SSUSI performs speaker separation with the help of speaker inventory. By combining the advantages of permutation invariant training (PIT) and speech extraction, SSUSI significantly outperforms conventional approaches. SSUES is a widely applicable technique that can substantially improve speaker separation performance using the output of first-pass separation. We evaluate the models on both speaker separation and speech recognition metrics.
翻译:我们建议使用发言人编目和估计演讲稿(SSUSIES)来隔离发言者,这是一个利用演讲人简介和估计演讲稿来隔离发言者的框架。SSSISISES包含两种方法:使用估计演讲稿(SSUSI)来隔离发言者,使用估计演讲稿(SSUSES)来隔离发言者。SSUSI在编目时使用演讲人资料来隔离发言者。通过结合变换培训(PIT)和语音提取的优势,SSUSI大大优于常规方法。SSUSIS是一种广泛应用的技术,可以使用先行分选制的输出来大大改善发言者的分离性能。我们评估了关于演讲人分隔和语音识别指标的模式。