接受过培训的情感识别视听变异器 (A Pre-trained Audio-Visual Transformer for Emotion Recognition) - 专知论文

会员服务 ·

0

Continuity · MoDELS · CCC · 相关系数 · 模型评估 ·

2022 年 1 月 23 日

A Pre-trained Audio-Visual Transformer for Emotion Recognition

翻译：接受过培训的情感识别视听变异器

Minh Tran,Mohammad Soleymani

from arxiv, Accepted by IEEE ICASSP 2022

In this paper, we introduce a pretrained audio-visual Transformer trained on more than 500k utterances from nearly 4000 celebrities from the VoxCeleb2 dataset for human behavior understanding. The model aims to capture and extract useful information from the interactions between human facial and auditory behaviors, with application in emotion recognition. We evaluate the model performance on two datasets, namely CREMAD-D (emotion classification) and MSP-IMPROV (continuous emotion regression). Experimental results show that fine-tuning the pre-trained model helps improving emotion classification accuracy by 5-7% and Concordance Correlation Coefficients (CCC) in continuous emotion recognition by 0.03-0.09 compared to the same model trained from scratch. We also demonstrate the robustness of finetuning the pre-trained model in a low-resource setting. With only 10% of the original training set provided, fine-tuning the pre-trained model can lead to at least 10% better emotion recognition accuracy and a CCC score improvement by at least 0.1 for continuous emotion recognition.

翻译：在本文中,我们引入了来自VoxCeleb2 数据集的近4000名名名人经过500公里以上话语培训的预先培训的视听变换器,用于理解人类行为。模型旨在捕捉和从人类面部和听觉行为之间的互动中提取有用的信息,并应用情感识别。我们评估了两个数据集的模型性能,即CREMAD-D(情感分类)和MSP-IMPROV(连续情感回归)。实验结果表明,对预培训模型的微调有助于将情感分类精度提高5-7%,而CCC(CCC)在连续情感识别方面提高了0.03-0.09,而从零开始培训的模型则提高了0.1。我们还展示了在低资源环境下对预培训模型进行微调的稳健性。仅提供10%的原始培训,对预培训模型的微调能导致至少10%的情感识别精度,并使CCC的分数提高,至少为0.1。

0

相关内容

Continuity

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

专知会员服务

52+阅读 · 2019年12月28日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

视觉注意机制下基于深度概率图模型的情感计算

国家自然科学基金

2+阅读 · 2013年12月31日

中国地区气象要素的多时空信息特征研究

国家自然科学基金

0+阅读 · 2013年12月31日

自我运动中Optic flow对物体运动知觉的影响机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

内质网应激在砷致神经细胞毒性中的作用机制及干预研究

国家自然科学基金

0+阅读 · 2012年12月31日

RGB-D视频序列特征的深度学习模型及在人体行为识别中的应用

国家自然科学基金

1+阅读 · 2012年12月31日

铸造缺陷多尺度三维表征及其对铸件服役性能和疲劳寿命的影响研究

国家自然科学基金

0+阅读 · 2011年12月31日

地方性氟中毒神经系统损伤及智力改变的分子发病机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

赋值理论与几何不等式的研究

国家自然科学基金

1+阅读 · 2011年12月31日

选择性视觉认知障碍动态变化的解剖神经机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于视觉注意机制的多尺度图像融合的研究

国家自然科学基金

1+阅读 · 2009年12月31日

A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

Arxiv

0+阅读 · 2022年4月20日

K-LITE: Learning Transferable Visual Models with External Knowledge

Arxiv

2+阅读 · 2022年4月20日

Audio-Visual Wake Word Spotting System For MISP Challenge 2021

Arxiv

0+阅读 · 2022年4月19日

Contrastive Demonstration Tuning for Pre-trained Language Models

Arxiv

0+阅读 · 2022年4月18日

Improving Pre-trained Language Models with Syntactic Dependency Prediction Task for Chinese Semantic Error Recognition

Arxiv

0+阅读 · 2022年4月15日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Counterfactual Zero-Shot and Open-Set Visual Recognition

Arxiv

12+阅读 · 2021年3月1日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

相关VIP内容

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

专知会员服务

52+阅读 · 2019年12月28日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

相关论文

A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

Arxiv

0+阅读 · 2022年4月20日

K-LITE: Learning Transferable Visual Models with External Knowledge

Arxiv

2+阅读 · 2022年4月20日

Audio-Visual Wake Word Spotting System For MISP Challenge 2021

Arxiv

0+阅读 · 2022年4月19日

Contrastive Demonstration Tuning for Pre-trained Language Models

Arxiv

0+阅读 · 2022年4月18日

Improving Pre-trained Language Models with Syntactic Dependency Prediction Task for Chinese Semantic Error Recognition

Arxiv

0+阅读 · 2022年4月15日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Counterfactual Zero-Shot and Open-Set Visual Recognition

Arxiv

12+阅读 · 2021年3月1日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

相关基金

视觉注意机制下基于深度概率图模型的情感计算

国家自然科学基金

2+阅读 · 2013年12月31日

中国地区气象要素的多时空信息特征研究

国家自然科学基金

0+阅读 · 2013年12月31日

自我运动中Optic flow对物体运动知觉的影响机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

内质网应激在砷致神经细胞毒性中的作用机制及干预研究

国家自然科学基金

0+阅读 · 2012年12月31日

RGB-D视频序列特征的深度学习模型及在人体行为识别中的应用

国家自然科学基金

1+阅读 · 2012年12月31日

铸造缺陷多尺度三维表征及其对铸件服役性能和疲劳寿命的影响研究

国家自然科学基金

0+阅读 · 2011年12月31日

地方性氟中毒神经系统损伤及智力改变的分子发病机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

赋值理论与几何不等式的研究

国家自然科学基金

1+阅读 · 2011年12月31日

选择性视觉认知障碍动态变化的解剖神经机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于视觉注意机制的多尺度图像融合的研究

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员