CIS-AVSR: 用于汽车内指挥识别的广东视听语音数据集 (CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition) - 专知论文

会员服务 ·

0

Performer · 数据集 · 语音识别 · 多峰值 · MoDELS ·

2022 年 1 月 11 日

CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

翻译：CIS-AVSR: 用于汽车内指挥识别的广东视听语音数据集

Wenliang Dai,Samuel Cahyawijaya,Tiezheng Yu,Elham J. Barezi,Peng Xu,Cheuk Tung Shadow Yiu,Rita Frieske,Holy Lovenia,Genta Indra Winata,Qifeng Chen,Xiaojuan Ma,Bertram E. Shi,Pascale Fung

from arxiv, 6 pages

With the rise of deep learning and intelligent vehicle, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, there is a data scarcity issue for low resource languages, hindering the development of research and applications. In this paper, we introduce a new dataset, Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR), for in-car command recognition in the Cantonese language with both video and audio data. It consists of 4,984 samples (8.3 hours) of 200 in-car commands recorded by 30 native Cantonese speakers. Furthermore, we augment our dataset using common in-car background noises to simulate real environments, producing a dataset 10 times larger than the collected one. We provide detailed statistics of both the clean and the augmented versions of our dataset. Moreover, we implement two multimodal baselines to demonstrate the validity of CI-AVSR. Experiment results show that leveraging the visual signal improves the overall performance of the model. Although our best model can achieve a considerable quality on the clean test set, the speech recognition quality on the noisy data is still inferior and remains as an extremely challenging task for real in-car speech recognition systems. The dataset and code will be released at https://github.com/HLTCHKUST/CI-AVSR.

翻译：随着深层学习和智能车辆的兴起,智能助理已成为便利驾驶和提供额外功能的必备车内部分,智能助理应能够处理一般命令和与汽车有关的指令,并采取相应行动,从而方便驾驶,改善安全;然而,低资源语言的数据稀缺问题,阻碍了研究和应用的发展;在本文中,我们推出一个新的数据集,即广东汽车视听语音语音识别系统(Canosese In-car-VAVSR),供广东语以视频和音频数据进行内部指令识别,其中包括由30个本地广东语发言者记录的200部汽车指令(8.3小时)的4 984个样本。此外,我们利用普通汽车背景噪音来模拟真实环境的数据集,产生比所收集的数据大10倍的数据集。我们提供了清洁和扩充版数据集的详细统计数据。此外,我们实施了两个多式联运基线,以证明CI-AVSR的有效性。实验结果表明,利用视觉信号改进模型的整体性工作表现。尽管我们的最佳模型在高压语音识别系统上仍能达到相当高的质量。

0

相关内容

Performer

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

神经元和星形胶质细胞特异性miRNA对神经网络发育和功能的调控机制

国家自然科学基金

1+阅读 · 2013年12月31日

基于深度线索的多假设变分场景流估计研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于高分辨率遥感影像的城市社区尺度的收入水平估算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于静息态fMRI和独立成分分析的神经胶质瘤术前多功能系统综合定位技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于压缩感知的高分辨率红外成像理论和方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于复值ICA和张量分解的完备fMRI数据分析方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于视频分析的儿童行为研究

国家自然科学基金

1+阅读 · 2011年12月31日

面向智能视频监控的高度多摄像机信息融合

国家自然科学基金

2+阅读 · 2009年12月31日

基于 EC-SMC-MC共培养体系的参莲提取物防治AS作用评价及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

SAR图像二次成像

国家自然科学基金

5+阅读 · 2008年12月31日

A Mobile Food Recognition System for Dietary Assessment

A Mobile Food Recognition System for Dietary Assessment

Arxiv

0+阅读 · 2022年4月20日

SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos

SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos

Arxiv

0+阅读 · 2022年4月20日

Audio-Visual Wake Word Spotting System For MISP Challenge 2021

Arxiv

0+阅读 · 2022年4月19日

TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark

Arxiv

0+阅读 · 2022年4月16日

Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions

Arxiv

18+阅读 · 2021年12月21日

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions

Arxiv

20+阅读 · 2021年8月30日

Recent Advances and Trends in Multimodal Deep Learning: A Review

Arxiv

57+阅读 · 2021年5月24日

Counterfactual Zero-Shot and Open-Set Visual Recognition

Arxiv

12+阅读 · 2021年3月1日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

VIP会员

文章信息

相关主题

相关VIP内容

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

数据驱动死亡：以色列AI战争机器如何锁定目标

【普林斯顿博士论文】通过以人为本的评估推动负责任的人工智能

ICML 2025 | BiAssemble: 双臂机器人几何拼合问题的协同可供性学习

ICML 2025杰出论文出炉：8篇获奖，南大研究者榜上有名

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A Mobile Food Recognition System for Dietary Assessment

A Mobile Food Recognition System for Dietary Assessment

Arxiv

0+阅读 · 2022年4月20日

SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos

SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos

Arxiv

0+阅读 · 2022年4月20日

Audio-Visual Wake Word Spotting System For MISP Challenge 2021

Arxiv

0+阅读 · 2022年4月19日

TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark

Arxiv

0+阅读 · 2022年4月16日

Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions

Arxiv

18+阅读 · 2021年12月21日

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions

Arxiv

20+阅读 · 2021年8月30日

Recent Advances and Trends in Multimodal Deep Learning: A Review

Arxiv

57+阅读 · 2021年5月24日

Counterfactual Zero-Shot and Open-Set Visual Recognition

Arxiv

12+阅读 · 2021年3月1日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

相关基金

神经元和星形胶质细胞特异性miRNA对神经网络发育和功能的调控机制

国家自然科学基金

1+阅读 · 2013年12月31日

基于深度线索的多假设变分场景流估计研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于高分辨率遥感影像的城市社区尺度的收入水平估算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于静息态fMRI和独立成分分析的神经胶质瘤术前多功能系统综合定位技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于压缩感知的高分辨率红外成像理论和方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于复值ICA和张量分解的完备fMRI数据分析方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于视频分析的儿童行为研究

国家自然科学基金

1+阅读 · 2011年12月31日

面向智能视频监控的高度多摄像机信息融合

国家自然科学基金

2+阅读 · 2009年12月31日

基于 EC-SMC-MC共培养体系的参莲提取物防治AS作用评价及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

SAR图像二次成像

国家自然科学基金

5+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员