EMA2S: 端到端多式电传到语音系统 (EMA2S: An End-to-End Multimodal Articulatory-to-Speech System) - 专知论文

会员服务 ·

0

多峰值 · 端到端 · Performance · 基准 · 损失 ·

2021 年 6 月 9 日

EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

翻译：EMA2S: 端到端多式电传到语音系统

Yu-Wen Chen,Kuo-Hsuan Hung,Shang-Yi Chuang,Jonathan Sherman,Wen-Chin Huang,Xugang Lu,Yu Tsao

Synthesized speech from articulatory movements can have real-world use for patients with vocal cord disorders, situations requiring silent speech, or in high-noise environments. In this work, we present EMA2S, an end-to-end multimodal articulatory-to-speech system that directly converts articulatory movements to speech signals. We use a neural-network-based vocoder combined with multimodal joint-training, incorporating spectrogram, mel-spectrogram, and deep features. The experimental results confirm that the multimodal approach of EMA2S outperforms the baseline system in terms of both objective evaluation and subjective evaluation metrics. Moreover, results demonstrate that joint mel-spectrogram and deep feature loss training can effectively improve system performance.

翻译：在这项工作中,我们展示了EMA2S,这是一个端到端的多式动脉交响系统,将动脉运动直接转换为语音信号。我们使用基于神经网络的电解器,结合多式联合培训,结合光谱、光谱和深层特征。实验结果证实,EMA2S的多式联运方法在客观评价和主观评价衡量标准方面都超过了基线系统。此外,结果还表明,联合光谱和深度特征损失培训可以有效地改善系统性能。

0

相关内容

多峰值

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

GitHub标星近10万：只需5秒音源就能实时“克隆”你的声音！

GitHub标星近10万：只需5秒音源就能实时“克隆”你的声音！

CSDN

4+阅读 · 2019年11月14日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【论文推荐】最新六篇行人再识别相关论文—特定视角、多目标、双注意匹配网络、联合属性-身份、迁移学习、多通道金字塔型

【论文推荐】最新六篇行人再识别相关论文—特定视角、多目标、双注意匹配网络、联合属性-身份、迁移学习、多通道金字塔型

专知

7+阅读 · 2018年4月13日

Github 项目推荐 | 用 Keras 实现的神经网络机器翻译

Github 项目推荐 | 用 Keras 实现的神经网络机器翻译

AI研习社

8+阅读 · 2018年3月11日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis

Arxiv

0+阅读 · 2021年8月4日

A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

Arxiv

0+阅读 · 2021年8月3日

End to End Bangla Speech Synthesis

Arxiv

0+阅读 · 2021年8月1日

USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments

Arxiv

0+阅读 · 2021年7月30日

VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

Arxiv

3+阅读 · 2021年1月29日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

Arxiv

3+阅读 · 2020年2月2日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

VIP会员

文章信息

相关主题

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《陆军战斗操练中的关键事件诊断》

《自适应训练辅助概念及其在空战管理员加速训练中的应用导论》最新126页

军事通信市场七大趋势概述

《抗干扰无人机蜂群行为的遗传算法方法》

相关资讯

GitHub标星近10万：只需5秒音源就能实时“克隆”你的声音！

GitHub标星近10万：只需5秒音源就能实时“克隆”你的声音！

CSDN

4+阅读 · 2019年11月14日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【论文推荐】最新六篇行人再识别相关论文—特定视角、多目标、双注意匹配网络、联合属性-身份、迁移学习、多通道金字塔型

【论文推荐】最新六篇行人再识别相关论文—特定视角、多目标、双注意匹配网络、联合属性-身份、迁移学习、多通道金字塔型

专知

7+阅读 · 2018年4月13日

Github 项目推荐 | 用 Keras 实现的神经网络机器翻译

Github 项目推荐 | 用 Keras 实现的神经网络机器翻译

AI研习社

8+阅读 · 2018年3月11日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

相关论文

Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis

Arxiv

0+阅读 · 2021年8月4日

A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

Arxiv

0+阅读 · 2021年8月3日

End to End Bangla Speech Synthesis

Arxiv

0+阅读 · 2021年8月1日

USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments

Arxiv

0+阅读 · 2021年7月30日

VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

Arxiv

3+阅读 · 2021年1月29日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

Arxiv

3+阅读 · 2020年2月2日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

微信扫码咨询专知VIP会员