关键词 " 点点和设备智能语音语音探测 " 隐含音频取消 (Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection) - 专知论文

会员服务 ·

0

echo回声（移动应用） · Performer · INTERACT · INFORMS · Less ·

2021 年 11 月 20 日

Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection

翻译：关键词 " 点点和设备智能语音语音探测 " 隐含音频取消

Samuele Cornell,Thomas Balestri,Thibaud Sénéchal

from arxiv, Submitted to ICASSP 2022

In many speech-enabled human-machine interaction scenarios, user speech can overlap with the device playback audio. In these instances, the performance of tasks such as keyword-spotting (KWS) and device-directed speech detection (DDD) can degrade significantly. To address this problem, we propose an implicit acoustic echo cancellation (iAEC) framework where a neural network is trained to exploit the additional information from a reference microphone channel to learn to ignore the interfering signal and improve detection performance. We study this framework for the tasks of KWS and DDD on, respectively, an augmented version of Google Speech Commands v2 and a real-world Alexa device dataset. Notably, we show a $56\%$ reduction in false-reject rate for the DDD task during device playback conditions. We also show comparable or superior performance over a strong end-to-end neural echo cancellation + KWS baseline for the KWS task with an order of magnitude less computational requirements.

翻译：在许多语音驱动的人体机器互动情景中,用户讲话可能与设备回放音频发生重叠。在这些情况下,关键词定位(KWS)和设备定向语音检测(DDD)等任务的执行可以显著降低。为了解决这个问题,我们提议一个隐含的声响取消(iAEC)框架,通过该框架培训神经网络利用参考麦克风频道的额外信息,学会忽略干扰信号并改进探测性能。我们研究了KWS和DDD的任务框架,分别针对谷歌语音指令v2的扩大版本和一个真实世界的亚历克斯设备数据集。值得注意的是,我们在设备回放条件下DDD任务中显示的虚假弹出率下降了56美元。我们还展示了与KWS任务的强烈端到端断线取消+KWS基线的类似或优异性性性性性。

0

相关内容

echo回声（移动应用）

echo回声（移动应用）

echo回声官网

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

专知会员服务

33+阅读 · 2020年5月12日

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

33+阅读 · 2020年5月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

MMDetection v2.0 训练自己的数据集

MMDetection v2.0 训练自己的数据集

CVer

30+阅读 · 2020年8月9日

聊聊RTA（Realtime API）

聊聊RTA（Realtime API）

AINLP

28+阅读 · 2020年6月5日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

10+阅读 · 2019年1月29日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

无人机

5+阅读 · 2018年10月4日

论文浅尝 | Improved Neural Relation Detection for KBQA

论文浅尝 | Improved Neural Relation Detection for KBQA

开放知识图谱

13+阅读 · 2018年1月21日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【音乐】Attention

【音乐】Attention

英语演讲视频每日一推

3+阅读 · 2017年8月22日

Dual-Decoder Transformer For end-to-end Mandarin Chinese Speech Recognition with Pinyin and Character

Arxiv

0+阅读 · 2022年1月26日

Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models

Arxiv

0+阅读 · 2022年1月25日

Endpoint Detection for Streaming End-to-End Multi-talker ASR

Arxiv

0+阅读 · 2022年1月24日

Joint magnitude estimation and phase recovery using Cycle-in-Cycle GAN for non-parallel speech enhancement

Arxiv

0+阅读 · 2022年1月24日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

Exploring the Semantics for Visual Relationship Detection

Arxiv

3+阅读 · 2019年4月3日

Triply Supervised Decoder Networks for Joint Detection and Segmentation

Arxiv

3+阅读 · 2018年9月25日

Apple Flower Detection using Deep Convolutional Networks

Arxiv

3+阅读 · 2018年9月17日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video

Arxiv

5+阅读 · 2017年9月18日

VIP会员

文章信息

相关主题

echo回声（移动应用）

相关VIP内容

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

专知会员服务

33+阅读 · 2020年5月12日

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

33+阅读 · 2020年5月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

MMDetection v2.0 训练自己的数据集

MMDetection v2.0 训练自己的数据集

CVer

30+阅读 · 2020年8月9日

聊聊RTA（Realtime API）

聊聊RTA（Realtime API）

AINLP

28+阅读 · 2020年6月5日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

10+阅读 · 2019年1月29日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

无人机

5+阅读 · 2018年10月4日

论文浅尝 | Improved Neural Relation Detection for KBQA

论文浅尝 | Improved Neural Relation Detection for KBQA

开放知识图谱

13+阅读 · 2018年1月21日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【音乐】Attention

【音乐】Attention

英语演讲视频每日一推

3+阅读 · 2017年8月22日

相关论文

Dual-Decoder Transformer For end-to-end Mandarin Chinese Speech Recognition with Pinyin and Character

Arxiv

0+阅读 · 2022年1月26日

Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models

Arxiv

0+阅读 · 2022年1月25日

Endpoint Detection for Streaming End-to-End Multi-talker ASR

Arxiv

0+阅读 · 2022年1月24日

Joint magnitude estimation and phase recovery using Cycle-in-Cycle GAN for non-parallel speech enhancement

Arxiv

0+阅读 · 2022年1月24日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

Exploring the Semantics for Visual Relationship Detection

Arxiv

3+阅读 · 2019年4月3日

Triply Supervised Decoder Networks for Joint Detection and Segmentation

Arxiv

3+阅读 · 2018年9月25日

Apple Flower Detection using Deep Convolutional Networks

Arxiv

3+阅读 · 2018年9月17日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video

Arxiv

5+阅读 · 2017年9月18日

微信扫码咨询专知VIP会员