关于视听现场分类多种模式联合建模和数据增强的研究 (A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification) - 专知论文

会员服务 ·

0

数据增强 · MoDELS · RandAugment · Networking · Mixup ·

2022 年 3 月 7 日

A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification

翻译：关于视听现场分类多种模式联合建模和数据增强的研究

Qing Wang,Jun Du,Siyuan Zheng,Yunqing Li,Yajian Wang,Yuzhong Wu,Hu Hu,Chao-Han Huck Yang,Sabato Marco Siniscalchi,Yannan Wang,Chin-Hui Lee

from arxiv, 5 pages, 1 figure

In this paper, we propose two techniques, namely joint modeling and data augmentation, to improve system performances for audio-visual scene classification (AVSC). We employ pre-trained networks trained only on image data sets to extract video embedding; whereas for audio embedding models, we decide to train them from scratch. We explore different neural network architectures for joint modeling to effectively combine the video and audio modalities. Moreover, data augmentation strategies are investigated to increase audio-visual training set size. For the video modality the effectiveness of several operations in RandAugment is verified. An audio-video joint mixup scheme is proposed to further improve AVSC performances. Evaluated on the development set of TAU Urban Audio Visual Scenes 2021, our final system can achieve the best accuracy of 94.2% among all single AVSC systems submitted to DCASE 2021 Task 1b.

翻译：在本文中,我们提出了两种技术,即联合建模和数据扩增,以提高视听场景分类系统性能(AVSC),我们只使用通过图像数据集培训的预先培训的网络来提取视频嵌入;对于音频嵌入模型,我们决定从零开始对其进行培训;我们探索不同的神经网络结构,以便联合建模,有效地将视频和音频模式结合起来;此外,还研究数据扩增战略,以提高视听培训的成套规模;对于视频模式,验证了RandAugment的若干操作的有效性;提议了一项视听联合混合计划,以进一步改进AVSC的性能;对TAU城市音频视觉场2021的成套开发进行了评估,我们的最后系统可以在提交DCASE 2021任务1b的所有单一的AVSC系统中实现94.2%的最佳精确度。

0

相关内容

数据增强

数据增强在机器学习领域多指采用一些方法（比如数据蒸馏，正负样本均衡等）来提高模型数据集的质量，增强数据。

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

精神分裂症易感因子ErbB4对篮状细胞和吊灯状细胞神经环路发育的调控和机制

国家自然科学基金

0+阅读 · 2014年12月31日

β-catenin/Ets1复合体在胶质母细胞瘤中对hTERT表达调控机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于时间序列的农田防护林年龄结构遥感识别研究

国家自然科学基金

0+阅读 · 2013年12月31日

lncRNAs和miR-592的相互作用对mESC向神经元分化的影响

国家自然科学基金

0+阅读 · 2012年12月31日

Cystatin B缺失与Prion疾病自噬作用机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

Arxiv

0+阅读 · 2022年4月20日

Retrieval Enhanced Data Augmentation for Question Answering on Privacy Policies

Retrieval Enhanced Data Augmentation for Question Answering on Privacy Policies

Arxiv

0+阅读 · 2022年4月19日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Arxiv

10+阅读 · 2020年3月31日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

VIP会员

文章信息

相关主题

相关VIP内容

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACML2025教程】迈向鲁棒且可信的大语言模型：问题与缓解策略

《利用人工智能改善军事警察行动：当下现状探索》最新95页报告

Google《AI智能体企业应用手册报告》，46页pdf

面向现代武装力量的高级AI驱动军事模拟与训练软件

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

Arxiv

0+阅读 · 2022年4月20日

Retrieval Enhanced Data Augmentation for Question Answering on Privacy Policies

Retrieval Enhanced Data Augmentation for Question Answering on Privacy Policies

Arxiv

0+阅读 · 2022年4月19日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Arxiv

10+阅读 · 2020年3月31日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

相关基金

精神分裂症易感因子ErbB4对篮状细胞和吊灯状细胞神经环路发育的调控和机制

国家自然科学基金

0+阅读 · 2014年12月31日

β-catenin/Ets1复合体在胶质母细胞瘤中对hTERT表达调控机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于时间序列的农田防护林年龄结构遥感识别研究

国家自然科学基金

0+阅读 · 2013年12月31日

lncRNAs和miR-592的相互作用对mESC向神经元分化的影响

国家自然科学基金

0+阅读 · 2012年12月31日

Cystatin B缺失与Prion疾病自噬作用机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员