2021年MISP挑战视听觉醒字显示系统 (Audio-Visual Wake Word Spotting System For MISP Challenge 2021) - 专知论文

会员服务 ·

0

INFORMS · 语音增强 · 讲稿 · CNN · 多峰值 ·

2022 年 4 月 20 日

Audio-Visual Wake Word Spotting System For MISP Challenge 2021

翻译：2021年MISP挑战视听觉醒字显示系统

Yanguang Xu,Jianwei Sun,Yang Han,Shuaijiang Zhao,Chaoyang Mei,Tingwei Guo,Shuran Zhou,Chuandong Xie,Wei Zou,Xiangang Li,Shuran Zhou,Chuandong Xie,Wei Zou,Xiangang Li

from arxiv, Accepted to ICASSP 2022

This paper presents the details of our system designed for the Task 1 of Multimodal Information Based Speech Processing (MISP) Challenge 2021. The purpose of Task 1 is to leverage both audio and video information to improve the environmental robustness of far-field wake word spotting. In the proposed system, firstly, we take advantage of speech enhancement algorithms such as beamforming and weighted prediction error (WPE) to address the multi-microphone conversational audio. Secondly, several data augmentation techniques are applied to simulate a more realistic far-field scenario. For the video information, the provided region of interest (ROI) is used to obtain visual representation. Then the multi-layer CNN is proposed to learn audio and visual representations, and these representations are fed into our two-branch attention-based network which can be employed for fusion, such as transformer and conformed. The focal loss is used to fine-tune the model and improve the performance significantly. Finally, multiple trained models are integrated by casting vote to achieve our final 0.091 score.

翻译：本文介绍了我们为多模式信息语音处理(MISP)挑战2021任务1而设计的系统的细节。任务1的目的是利用音频和视频信息,提高远方醒醒字定位的环境稳健性。首先,在拟议的系统中,我们利用波束成形和加权预测错误等语音增强算法来解决多声频对话音频问题。第二,运用了几种数据增强技术模拟更现实的远方情景。对于视频信息,所提供感兴趣的区域(ROI)被用于获得视觉代表。然后,多层CNN建议学习音频和视觉表现,这些表现被输入到我们的双部门关注网络中,可以用于凝聚,例如变压器和相容。焦点损失被用来微调模型并显著改进性能。最后,通过投票将多个经过培训的模型整合到最后的 0.091 分中。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

石墨烯@类水滑石对水中抗生素/Cr(VI)复合污染物的协同去除机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

分数阶非线性偏微分方程的相关数学问题

国家自然科学基金

0+阅读 · 2014年12月31日

Galileo/北斗超快速轨道确定方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

次高压条件下助熔剂法合成过渡金属氧化物高压相晶体

国家自然科学基金

0+阅读 · 2013年12月31日

氨气氮化法制备白光LED用氮氧化物发光材料及其性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

非高斯噪声驱动的无穷维随机动力系统的动力学研究

国家自然科学基金

1+阅读 · 2012年12月31日

电子铁电体的多重有序态的竞争与耦合

国家自然科学基金

0+阅读 · 2012年12月31日

肝素酶过表达与脓毒症血管内皮功能不全的相关性研究

国家自然科学基金

0+阅读 · 2012年12月31日

ANCA诱导的ROS在调控中性粒细胞凋亡∕NETosis转换中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

稀土-铬-硫化合物的Jahn-Teller效应与磁电耦合研究

国家自然科学基金

0+阅读 · 2009年12月31日

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

Arxiv

0+阅读 · 2022年6月8日

Deep Learning Techniques for Visual Counting

Arxiv

0+阅读 · 2022年6月7日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

CAN-NER: Convolutional Attention Network forChinese Named Entity Recognition

Arxiv

16+阅读 · 2019年4月3日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Attention U-Net: Learning Where to Look for the Pancreas

Arxiv

17+阅读 · 2018年5月20日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

VIP会员

文章信息

相关主题

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

赋能真实世界：基于大语言模型的产业智能体技术、实践与评测综述

军事行动中人工智能系统目标交战的附带损伤评估模型 | 最新文献

【普林斯顿博士论文】面向人本机器人学的安全与学习博弈论融合

美陆军协会（AUSA）2025 年会公布的美国十大武器与防务产品创新

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

相关论文

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

Arxiv

0+阅读 · 2022年6月8日

Deep Learning Techniques for Visual Counting

Arxiv

0+阅读 · 2022年6月7日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

CAN-NER: Convolutional Attention Network forChinese Named Entity Recognition

Arxiv

16+阅读 · 2019年4月3日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Attention U-Net: Learning Where to Look for the Pancreas

Arxiv

17+阅读 · 2018年5月20日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

相关基金

石墨烯@类水滑石对水中抗生素/Cr(VI)复合污染物的协同去除机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

分数阶非线性偏微分方程的相关数学问题

国家自然科学基金

0+阅读 · 2014年12月31日

Galileo/北斗超快速轨道确定方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

次高压条件下助熔剂法合成过渡金属氧化物高压相晶体

国家自然科学基金

0+阅读 · 2013年12月31日

氨气氮化法制备白光LED用氮氧化物发光材料及其性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

非高斯噪声驱动的无穷维随机动力系统的动力学研究

国家自然科学基金

1+阅读 · 2012年12月31日

电子铁电体的多重有序态的竞争与耦合

国家自然科学基金

0+阅读 · 2012年12月31日

肝素酶过表达与脓毒症血管内皮功能不全的相关性研究

国家自然科学基金

0+阅读 · 2012年12月31日

ANCA诱导的ROS在调控中性粒细胞凋亡∕NETosis转换中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

稀土-铬-硫化合物的Jahn-Teller效应与磁电耦合研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员