利用转让学习和分光增强改进语音情感识别 (Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation) - 专知论文

会员服务 ·

0

迁移学习 · 汇聚层 · 学成 · MoDELS · 统计量 ·

2021 年 8 月 16 日

Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation

翻译：利用转让学习和分光增强改进语音情感识别

Sarala Padi,Seyed Omid Sadjadi,Dinesh Manocha,Ram D. Sriram

from arxiv, Accepted at ACM/SIGCHI ICMI'21

Automatic speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction. One of the main challenges in SER is data scarcity, i.e., insufficient amounts of carefully labeled data to build and fully explore complex deep learning models for emotion classification. This paper aims to address this challenge using a transfer learning strategy combined with spectrogram augmentation. Specifically, we propose a transfer learning approach that leverages a pre-trained residual network (ResNet) model including a statistics pooling layer from speaker recognition trained using large amounts of speaker-labeled data. The statistics pooling layer enables the model to efficiently process variable-length input, thereby eliminating the need for sequence truncation which is commonly used in SER systems. In addition, we adopt a spectrogram augmentation technique to generate additional training data samples by applying random time-frequency masks to log-mel spectrograms to mitigate overfitting and improve the generalization of emotion recognition models. We evaluate the effectiveness of our proposed approach on the interactive emotional dyadic motion capture (IEMOCAP) dataset. Experimental results indicate that the transfer learning and spectrogram augmentation approaches improve the SER performance, and when combined achieve state-of-the-art results.

翻译：自动言语情绪识别(SER)是一项具有挑战性的任务,在人与计算机的自然互动中发挥着关键作用。SER的主要挑战之一是数据稀缺,即没有足够数量经过仔细标记的数据来建立和充分探索复杂的情感分类深层学习模式。本文件旨在利用转让学习战略以及光谱增强来应对这一挑战。具体地说,我们建议采用转让学习方法,利用预先培训的残余网络(ResNet)模型,包括使用大量语音标签数据培训的语音识别的统计集合层。统计数据集合层使模型能够高效处理变长输入,从而消除SER系统中常用的序列脱线需求。此外,我们采用光谱增强技术,通过随机使用时频掩光仪来生成更多的培训数据样本,以缓解对情绪识别模型的过度调整和改进。我们评估了我们提议的交互式情感运动捕获方法(IEMOCAP)的有效性。实验结果表明,转移学习和光谱扩增方法提高了SER的性能,并在实现综合状态时实现了结果。

0

相关内容

迁移学习

迁移学习（Transfer Learning）是一种机器学习方法，是把一个领域（即源领域）的知识，迁移到另外一个领域（即目标领域），使得目标领域能够取得更好的学习效果。迁移学习（TL）是机器学习（ML）中的一个研究问题，着重于存储在解决一个问题时获得的知识并将其应用于另一个但相关的问题。例如，在学习识别汽车时获得的知识可以在尝试识别卡车时应用。尽管这两个领域之间的正式联系是有限的，但这一领域的研究与心理学文献关于学习转移的悠久历史有关。从实践的角度来看，为学习新任务而重用或转移先前学习的任务中的信息可能会显着提高强化学习代理的样本效率。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【KDD2021】深度图卷积网络混合归一化的精确和多样化推荐

专知会员服务

22+阅读 · 2021年8月23日

【NUS-Xavier 教授】图神经网络应用概述，15页ppt

专知会员服务

53+阅读 · 2021年6月30日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【超分辨率| 2019最新综述】图像超分辨率的深度学习，附PDF（Deep Learning for Image Super-resolution: A Survey）

【超分辨率| 2019最新综述】图像超分辨率的深度学习，附PDF（Deep Learning for Image Super-resolution: A Survey）

专知会员服务

60+阅读 · 2019年11月16日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新五篇图像分割相关论文—R2U-Net、ScatterNet混合深度学习、分离卷积编解码、控制、Embedding

【论文推荐】最新五篇图像分割相关论文—R2U-Net、ScatterNet混合深度学习、分离卷积编解码、控制、Embedding

专知

7+阅读 · 2018年2月26日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Sparsifying Neural Network Connections for Face Recognition

Sparsifying Neural Network Connections for Face Recognition

统计学习与视觉计算组

7+阅读 · 2017年6月10日

Adapting TTS models For New Speakers using Transfer Learning

Arxiv

0+阅读 · 2021年10月12日

Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs for Robust Speech Recognition

Arxiv

0+阅读 · 2021年10月11日

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Arxiv

0+阅读 · 2021年10月9日

SCaLa: Supervised Contrastive Learning for End-to-End Automatic Speech Recognition

SCaLa: Supervised Contrastive Learning for End-to-End Automatic Speech Recognition

Arxiv

0+阅读 · 2021年10月8日

Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask

Arxiv

0+阅读 · 2021年10月8日

StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis

Arxiv

0+阅读 · 2021年10月8日

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

Arxiv

8+阅读 · 2021年6月10日

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Arxiv

7+阅读 · 2019年4月18日

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Arxiv

5+阅读 · 2018年6月4日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

VIP会员

文章信息

相关主题

相关VIP内容

【KDD2021】深度图卷积网络混合归一化的精确和多样化推荐

专知会员服务

22+阅读 · 2021年8月23日

【NUS-Xavier 教授】图神经网络应用概述，15页ppt

专知会员服务

53+阅读 · 2021年6月30日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【超分辨率| 2019最新综述】图像超分辨率的深度学习，附PDF（Deep Learning for Image Super-resolution: A Survey）

【超分辨率| 2019最新综述】图像超分辨率的深度学习，附PDF（Deep Learning for Image Super-resolution: A Survey）

专知会员服务

60+阅读 · 2019年11月16日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新五篇图像分割相关论文—R2U-Net、ScatterNet混合深度学习、分离卷积编解码、控制、Embedding

【论文推荐】最新五篇图像分割相关论文—R2U-Net、ScatterNet混合深度学习、分离卷积编解码、控制、Embedding

专知

7+阅读 · 2018年2月26日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Sparsifying Neural Network Connections for Face Recognition

Sparsifying Neural Network Connections for Face Recognition

统计学习与视觉计算组

7+阅读 · 2017年6月10日

相关论文

Adapting TTS models For New Speakers using Transfer Learning

Arxiv

0+阅读 · 2021年10月12日

Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs for Robust Speech Recognition

Arxiv

0+阅读 · 2021年10月11日

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Arxiv

0+阅读 · 2021年10月9日

SCaLa: Supervised Contrastive Learning for End-to-End Automatic Speech Recognition

SCaLa: Supervised Contrastive Learning for End-to-End Automatic Speech Recognition

Arxiv

0+阅读 · 2021年10月8日

Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask

Arxiv

0+阅读 · 2021年10月8日

StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis

Arxiv

0+阅读 · 2021年10月8日

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

Arxiv

8+阅读 · 2021年6月10日

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Arxiv

7+阅读 · 2019年4月18日

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Arxiv

5+阅读 · 2018年6月4日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

微信扫码咨询专知VIP会员