多用户语音过滤器通过接听演讲人 (Multi-user VoiceFilter-Lite via Attentive Speaker Embedding) - 专知论文

会员服务 ·

0

语音识别 · 注意力机制 · 可约的 · MoDELS · Performer ·

2021 年 11 月 8 日

Multi-user VoiceFilter-Lite via Attentive Speaker Embedding

翻译：多用户语音过滤器通过接听演讲人

Rajeev Rikhye,Quan Wang,Qiao Liang,Yanzhang He,Ian McGraw

In this paper, we propose a solution to allow speaker conditioned speech models, such as VoiceFilter-Lite, to support an arbitrary number of enrolled users in a single pass. This is achieved by using an attention mechanism on multiple speaker embeddings to compute a single attentive embedding, which is then used as a side input to the model. We implemented multi-user VoiceFilter-Lite and evaluated it for three tasks: (1) a streaming automatic speech recognition (ASR) task; (2) a text-independent speaker verification task; and (3) a personalized keyphrase detection task, where ASR has to detect keyphrases from multiple enrolled users in a noisy environment. Our experiments show that, with up to four enrolled users, multi-user VoiceFilter-Lite is able to significantly reduce speech recognition and speaker verification errors when there is overlapping speech, without affecting performance under other acoustic conditions. This attentive speaker embedding approach can also be easily applied to other speaker-conditioned models such as personal VAD and personalized ASR.

翻译：在本文中,我们提出一个解决方案,允许使用诸如语音Filter-Lite等有特定条件的语音模型,支持任意数目的注册用户在单关卡中使用任意的注册用户。通过对多个语音嵌入器的注意机制来计算单一的专注嵌入器,然后将其作为该模型的侧面输入。我们实施了多用户语音过滤器-Lite,并评估了它的三个任务:(1) 自动语音识别(ASR)流传任务;(2) 文本独立语音验证任务;(3) 个人化关键词探测任务,该关键词探测任务需要由多个注册用户在吵闹的环境中探测关键词。我们的实验显示,在有4个注册用户的情况下,多用户语音过滤器能够大大减少语音识别和语音核实错误,而不会影响其他声学条件下的性能。这种谨慎的发言者嵌入方法也可以很容易适用于其他有特定语种的模式,例如个人VAD和个性化的ASR。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

30分钟快速了解机器学习，CBIO Chloé-Agathe Azencott讲解，41页ppt

30分钟快速了解机器学习，CBIO Chloé-Agathe Azencott讲解，41页ppt

专知会员服务

23+阅读 · 2021年10月16日

【快讯】ECCV 2020论文出炉，1361篇上榜，你的paper中了吗？

【快讯】ECCV 2020论文出炉，1361篇上榜，你的paper中了吗？

专知会员服务

56+阅读 · 2020年7月3日

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

专知会员服务

29+阅读 · 2020年5月20日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

18+阅读 · 2020年2月26日

【图神经网络概览】《Graph Neural Networks - An overview | AI Summer》

【图神经网络概览】《Graph Neural Networks - An overview | AI Summer》

专知会员服务

50+阅读 · 2020年2月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

【Google新论文】Learning Transferable Graph Exploration 附论文下载

【Google新论文】Learning Transferable Graph Exploration 附论文下载

专知会员服务

6+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

52+阅读 · 2019年9月29日

LibRec 精选：你见过最有趣的论文标题是什么？

LibRec 精选：你见过最有趣的论文标题是什么？

LibRec智能推荐

4+阅读 · 2019年11月6日

【论文笔记】注意力机制的协同过滤模型 Attentive Collaborative Filtering(ACF)

【论文笔记】注意力机制的协同过滤模型 Attentive Collaborative Filtering(ACF)

专知

49+阅读 · 2019年9月23日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

14+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

已删除

雪球

6+阅读 · 2018年8月19日

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

专知

21+阅读 · 2018年6月18日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection

Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection

Arxiv

0+阅读 · 2022年1月10日

Emotional Speaker Identification using a Novel Capsule Nets Model

Arxiv

0+阅读 · 2022年1月9日

A Cooperative Memory Network for Personalized Task-oriented Dialogue Systems with Incomplete User Profiles

Arxiv

8+阅读 · 2021年2月16日

All Word Embeddings from One Embedding

Arxiv

4+阅读 · 2020年5月25日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Relation-Aware Graph Attention Network for Visual Question Answering

Relation-Aware Graph Attention Network for Visual Question Answering

Arxiv

7+阅读 · 2019年10月9日

Multi-view Knowledge Graph Embedding for Entity Alignment

Arxiv

36+阅读 · 2019年6月6日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

A Framework of Transfer Learning in Object Detection for Embedded Systems

Arxiv

3+阅读 · 2018年11月12日

Unified Hypersphere Embedding for Speaker Recognition

Arxiv

5+阅读 · 2018年7月22日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

30分钟快速了解机器学习，CBIO Chloé-Agathe Azencott讲解，41页ppt

30分钟快速了解机器学习，CBIO Chloé-Agathe Azencott讲解，41页ppt

专知会员服务

23+阅读 · 2021年10月16日

【快讯】ECCV 2020论文出炉，1361篇上榜，你的paper中了吗？

【快讯】ECCV 2020论文出炉，1361篇上榜，你的paper中了吗？

专知会员服务

56+阅读 · 2020年7月3日

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

专知会员服务

29+阅读 · 2020年5月20日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

18+阅读 · 2020年2月26日

【图神经网络概览】《Graph Neural Networks - An overview | AI Summer》

【图神经网络概览】《Graph Neural Networks - An overview | AI Summer》

专知会员服务

50+阅读 · 2020年2月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

【Google新论文】Learning Transferable Graph Exploration 附论文下载

【Google新论文】Learning Transferable Graph Exploration 附论文下载

专知会员服务

6+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

52+阅读 · 2019年9月29日

热门VIP内容

相关资讯

LibRec 精选：你见过最有趣的论文标题是什么？

LibRec 精选：你见过最有趣的论文标题是什么？

LibRec智能推荐

4+阅读 · 2019年11月6日

【论文笔记】注意力机制的协同过滤模型 Attentive Collaborative Filtering(ACF)

【论文笔记】注意力机制的协同过滤模型 Attentive Collaborative Filtering(ACF)

专知

49+阅读 · 2019年9月23日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

14+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

已删除

雪球

6+阅读 · 2018年8月19日

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

专知

21+阅读 · 2018年6月18日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

相关论文

Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection

Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection

Arxiv

0+阅读 · 2022年1月10日

Emotional Speaker Identification using a Novel Capsule Nets Model

Arxiv

0+阅读 · 2022年1月9日

A Cooperative Memory Network for Personalized Task-oriented Dialogue Systems with Incomplete User Profiles

Arxiv

8+阅读 · 2021年2月16日

All Word Embeddings from One Embedding

Arxiv

4+阅读 · 2020年5月25日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Relation-Aware Graph Attention Network for Visual Question Answering

Relation-Aware Graph Attention Network for Visual Question Answering

Arxiv

7+阅读 · 2019年10月9日

Multi-view Knowledge Graph Embedding for Entity Alignment

Arxiv

36+阅读 · 2019年6月6日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

A Framework of Transfer Learning in Object Detection for Embedded Systems

Arxiv

3+阅读 · 2018年11月12日

Unified Hypersphere Embedding for Speaker Recognition

Arxiv

5+阅读 · 2018年7月22日

微信扫码咨询专知VIP会员