语音情感识别多窗口数据增强方法 (Multi-Window Data Augmentation Approach for Speech Emotion Recognition) - 专知论文

会员服务 ·

0

Performer · 数据增强 · Extensibility · MoDELS · Microsoft Windows ·

2021 年 6 月 2 日

Multi-Window Data Augmentation Approach for Speech Emotion Recognition

翻译：语音情感识别多窗口数据增强方法

Sarala Padi,Dinesh Manocha,Ram D. Sriram

We present a Multi-Window Data Augmentation (MWA-SER) approach for speech emotion recognition. MWA-SER is a unimodal approach that focuses on two key concepts; designing the speech augmentation method and building the deep learning model to recognize the underlying emotion of an audio signal. Our proposed multi-window augmentation approach generates additional data samples from the speech signal by employing multiple window sizes in the audio feature extraction process. We show that our augmentation method, combined with a deep learning model, improves speech emotion recognition performance. We evaluate the performance of our approach on three benchmark datasets: IEMOCAP, SAVEE, and RAVDESS. We show that the multi-window model improves the SER performance and outperforms a single-window model. The notion of finding the best window size is an essential step in audio feature extraction. We perform extensive experimental evaluations to find the best window choice and explore the windowing effect for SER analysis.

翻译：我们提出了一种多窗口数据增强(MWA-SER)的语音情感识别方法。 MWA-SER是一种单一方式的方法,侧重于两个关键概念;设计语音增强方法和建立深层次学习模型,以识别音频信号背后的情感。我们提议的多窗口增强方法通过在音频特征提取过程中使用多个窗口大小,从语音信号中产生更多的数据样本。我们展示了我们的增强方法,加上深层学习模型,改善了语音识别性能。我们评估了我们在三个基准数据集:IEMOCAP、SAVEE和RAVDESS上的方法的绩效。我们显示,多窗口模型改进了SER的性能并超越了单一窗口模式。找到最佳窗口大小的概念是音频特征提取过程中的一个重要步骤。我们进行了广泛的实验性评估,以找到最佳窗口选择,并探索SER分析的窗口效应。

0

相关内容

Performer

【CVPR 2021】姿态可控的语音驱动说话人脸

专知会员服务

16+阅读 · 2021年5月13日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

最新《端到端人脸识别》2020综述论文，44页pdf

专知会员服务

80+阅读 · 2020年10月2日

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

专知会员服务

17+阅读 · 2020年5月6日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

近期声学领域前沿论文(No. 3)

近期声学领域前沿论文(No. 3)

深度学习每日摘要

24+阅读 · 2019年3月31日

（Python）3D人脸处理工具Face3d

（Python）3D人脸处理工具Face3d

AI研习社

7+阅读 · 2019年2月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

改进语音识别性能的数据增强技巧

改进语音识别性能的数据增强技巧

深度学习每日摘要

8+阅读 · 2018年4月22日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

专知

14+阅读 · 2018年2月4日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

LibRec 每周算法：Wide & Deep (by Google)

LibRec 每周算法：Wide & Deep (by Google)

LibRec智能推荐

9+阅读 · 2017年10月25日

一文读懂语音识别史

一文读懂语音识别史

机械鸡

9+阅读 · 2017年10月16日

Deep Transfer Learning based COVID-19 Detection in Cough, Breath and Speech using Bottleneck Features

Deep Transfer Learning based COVID-19 Detection in Cough, Breath and Speech using Bottleneck Features

Arxiv

0+阅读 · 2021年7月27日

Unsupervised Domain Adaptation for Hate Speech Detection Using a Data Augmentation Approach

Arxiv

0+阅读 · 2021年7月27日

Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis

Arxiv

0+阅读 · 2021年7月27日

A Study on Speech Enhancement Based on Diffusion Probabilistic Model

Arxiv

0+阅读 · 2021年7月25日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

Multi-Task Neural Models for Translating Between Styles Within and Across Languages

Arxiv

4+阅读 · 2018年6月12日

Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning

Arxiv

3+阅读 · 2018年4月5日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

Microsoft Windows

相关VIP内容

【CVPR 2021】姿态可控的语音驱动说话人脸

专知会员服务

16+阅读 · 2021年5月13日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

最新《端到端人脸识别》2020综述论文，44页pdf

专知会员服务

80+阅读 · 2020年10月2日

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

专知会员服务

17+阅读 · 2020年5月6日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

近期声学领域前沿论文(No. 3)

近期声学领域前沿论文(No. 3)

深度学习每日摘要

24+阅读 · 2019年3月31日

（Python）3D人脸处理工具Face3d

（Python）3D人脸处理工具Face3d

AI研习社

7+阅读 · 2019年2月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

改进语音识别性能的数据增强技巧

改进语音识别性能的数据增强技巧

深度学习每日摘要

8+阅读 · 2018年4月22日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

专知

14+阅读 · 2018年2月4日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

LibRec 每周算法：Wide & Deep (by Google)

LibRec 每周算法：Wide & Deep (by Google)

LibRec智能推荐

9+阅读 · 2017年10月25日

一文读懂语音识别史

一文读懂语音识别史

机械鸡

9+阅读 · 2017年10月16日

相关论文

Deep Transfer Learning based COVID-19 Detection in Cough, Breath and Speech using Bottleneck Features

Deep Transfer Learning based COVID-19 Detection in Cough, Breath and Speech using Bottleneck Features

Arxiv

0+阅读 · 2021年7月27日

Unsupervised Domain Adaptation for Hate Speech Detection Using a Data Augmentation Approach

Arxiv

0+阅读 · 2021年7月27日

Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis

Arxiv

0+阅读 · 2021年7月27日

A Study on Speech Enhancement Based on Diffusion Probabilistic Model

Arxiv

0+阅读 · 2021年7月25日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

Multi-Task Neural Models for Translating Between Styles Within and Across Languages

Arxiv

4+阅读 · 2018年6月12日

Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning

Arxiv

3+阅读 · 2018年4月5日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员