单一和多对话者演讲有选择性听力关注的发言者核查目标 (Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker Speech)

Speaker verification has been studied mostly under the single-talker condition. It is adversely affected in the presence of interference speakers. Inspired by the study on target speaker extraction, e.g., SpEx, we propose a unified speaker verification framework for both single- and multi-talker speech, that is able to pay selective auditory attention to the target speaker. This target speaker verification (tSV) framework jointly optimizes a speaker attention module and a speaker representation module via multi-task learning. We study four different target speaker embedding schemes under the tSV framework. The experimental results show that all four target speaker embedding schemes significantly outperform other competitive solutions for multi-talker speech. Notably, the best tSV speaker embedding scheme achieves 76.0% and 55.3% relative improvements over the baseline system on the WSJ0-2mix-extr and Libri2Mix corpora in terms of equal-error-rate for 2-talker speech, while the performance of tSV for single-talker speech is on par with that of traditional speaker verification system, that is trained and evaluated under the same single-talker condition.

翻译：发言人核查大多是在单一跟踪器条件下研究的,在干扰演讲者在场的情况下受到不利影响。根据对目标演讲者提取(例如SpEx)的研究,我们提议一个单一和多对话者演讲的统一演讲者核查框架,能够对目标演讲者有选择性地给予听力注意。这个目标演讲者核查(tSV)框架通过多任务学习,共同优化一个演讲者注意模块和一个演讲者代表模块。我们研究四个不同的目标演讲者在tSV框架下嵌入计划。实验结果显示,所有四个目标演讲者嵌入计划都大大优于多对话者演讲的其他竞争性解决方案。值得注意的是,最佳的tSV演讲者嵌入计划在WSJ0-2mix-extr和Libri2Mix Corora基线系统中实现了76.0%和55.3%的相对改进,在2个对话者演讲的同等比率上,而单一对话者演讲者演讲的性能与传统演讲者核查制度相同,在同一个条件下得到训练和评价。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【斯坦福大学课程】2021年深度多任务学习与元学习，CS 330: Deep Multi-Task and Meta Learning

专知会员服务

110+阅读 · 2022年3月2日

ICLR2021放榜了！ 687篇入选34篇得满分！ 48篇orals，108篇spotlights，531篇poster

专知会员服务

24+阅读 · 2021年1月13日

达摩院基于元学习的对话系统

专知会员服务

25+阅读 · 2021年1月1日

近期必读的七篇 NeurIPS 2020【视觉目标检测】相关论文和代码

专知会员服务

31+阅读 · 2020年12月22日