利用发言人嵌入器使自我监督模式适应多对话者语音识别 (Adapting self-supervised models to multi-talker speech recognition using speaker embeddings)

Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications. However, these models often have degraded performance for multi-talker scenarios -- possibly due to the domain mismatch -- which severely limits their use for such applications. In this paper, we investigate the adaptation of upstream SSL models to the multi-talker automatic speech recognition (ASR) task under two conditions. First, when segmented utterances are given, we show that adding a target speaker extraction (TSE) module based on enrollment embeddings is complementary to mixture-aware pre-training. Second, for unsegmented mixtures, we propose a novel joint speaker modeling (JSM) approach, which aggregates information from all speakers in the mixture through their embeddings. With controlled experiments on Libri2Mix, we show that using speaker embeddings provides relative WER improvements of 9.1% and 42.1% over strong baselines for the segmented and unsegmented cases, respectively. We also demonstrate the effectiveness of our models for real conversational mixtures through experiments on the AMI dataset.

翻译：在语言处理任务中,特别是在单对讲者应用程序中,学习未经明确监督的数据表现的自我监督学习方法在语音处理任务中越来越受欢迎。然而,这些模型往往降低了多讲者情景的性能 -- -- 这可能是由于域错配 -- -- 严重限制了对此类应用的使用。在本文件中,我们调查了上游SSL模型在两种条件下适应多讲者自动语音识别任务的情况。首先,当给出了分解的语句时,我们表明,基于注册嵌入的标语提取模块是混合觉悟预培训的补充。第二,对于未分解的混合物,我们建议采用新型的联合语音模型(JSM)方法,该方法通过嵌入将混合物中所有发言者的信息汇总在一起。在Libri2Mix上进行的有控制的实验显示,使用语音嵌入为分解和未分解案例的强基线分别提供了9.1%和42.1%的相对WER改进率。我们还展示了我们通过AMI数据集实验实现真实对话混合物模型的有效性。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日