使用神经电导管改造实时人工 MRI 的实时电动 MRI 电动电动电动电动电动电动电动话 (Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders) - 专知论文

会员服务 ·

0

估计/估计量 · Networks · 平均绝对误差 · 均值 · Performer ·

2021 年 4 月 23 日

Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders

翻译：使用神经电导管改造实时人工 MRI 的实时电动 MRI 电动电动电动电动电动电动电动话

Yide Yu,Amin Honarmandi Shandiz,László Tóth

from arxiv, 6 pages. 4 tables, 3 figures

Several approaches exist for the recording of articulatory movements, such as eletromagnetic and permanent magnetic articulagraphy, ultrasound tongue imaging and surface electromyography. Although magnetic resonance imaging (MRI) is more costly than the above approaches, the recent developments in this area now allow the recording of real-time MRI videos of the articulators with an acceptable resolution. Here, we experiment with the reconstruction of the speech signal from a real-time MRI recording using deep neural networks. Instead of estimating speech directly, our networks are trained to output a spectral vector, from which we reconstruct the speech signal using the WaveGlow neural vocoder. We compare the performance of three deep neural architectures for the estimation task, combining convolutional (CNN) and recurrence-based (LSTM) neural layers. Besides the mean absolute error (MAE) of our networks, we also evaluate our models by comparing the speech signals obtained using several objective speech quality metrics like the mean cepstral distortion (MCD), Short-Time Objective Intelligibility (STOI), Perceptual Evaluation of Speech Quality (PESQ) and Signal-to-Distortion Ratio (SDR). The results indicate that our approach can successfully reconstruct the gross spectral shape, but more improvements are needed to reproduce the fine spectral details.

翻译：存在几种记录动脉动的方法,例如电磁和永久磁动脉动、超声波舌成像和地表电动学。虽然磁共振成像比上述方法更昂贵,但最近这一领域的发展使得能够以可接受的分辨率实时记录动脉动的MRI视频。在这里,我们试验利用深神经网络从实时MRI录音中重建语音信号。我们的网络不是直接估计讲话,而是训练一个光谱矢量器,我们从中利用波格罗神经电动器重建语音信号。我们比较了三种深神经结构的性能,以估计任务为目的,将进动(CNN)和复发(LSTM)神经层结合起来。除了我们网络的绝对偏差(MAE)之外,我们还通过比较使用一些客观的语音质量指标(例如中度扭曲)、短时端目标智能(STOI),短端感应变微图像(PS-DRR)的模型改进(SIMQ-DRBR ) 和图像(SIR-BRBR)的改进方法(S-BRIQ) 所需的总质量和(PIS-BRIS-BRRRRR) 改进结果(S-S-BRisalQ),我们所需要的总的改进)。

0

相关内容

估计/估计量

估计/估计量

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

专知会员服务

78+阅读 · 2020年2月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

专知会员服务

52+阅读 · 2019年12月28日

【Caltech&NVIDiA】张量在机器学习中的作用（附pdf）

专知会员服务

13+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

已删除

inpluslab

8+阅读 · 2019年10月29日

VALSE Webinar 19-22期医学影像处理与分析

VALSE Webinar 19-22期医学影像处理与分析

VALSE

9+阅读 · 2019年8月30日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

博客 | Github开源人体姿态识别项目OpenPose中文文档

博客 | Github开源人体姿态识别项目OpenPose中文文档

AI研习社

13+阅读 · 2018年11月21日

【语音识别】从入门到精通——最全干货大合集！

【语音识别】从入门到精通——最全干货大合集！

专知

20+阅读 · 2018年11月5日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Speech BERT Embedding For Improving Prosody in Neural TTS

Arxiv

0+阅读 · 2021年6月15日

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

Arxiv

0+阅读 · 2021年6月13日

Estimating articulatory movements in speech production with transformer networks

Arxiv

0+阅读 · 2021年6月12日

Real-Time Global Illumination Decomposition of Videos

Arxiv

0+阅读 · 2021年6月10日

Brain Age Estimation From MRI Using Cascade Networks with Ranking Loss

Arxiv

0+阅读 · 2021年6月6日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Sparse2Dense: From direct sparse odometry to dense 3D reconstruction

Sparse2Dense: From direct sparse odometry to dense 3D reconstruction

Arxiv

9+阅读 · 2019年3月21日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

VIP会员

文章信息

相关主题

估计/估计量

平均绝对误差

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

专知会员服务

78+阅读 · 2020年2月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

专知会员服务

52+阅读 · 2019年12月28日

【Caltech&NVIDiA】张量在机器学习中的作用（附pdf）

专知会员服务

13+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《陆军战斗操练中的关键事件诊断》

《自适应训练辅助概念及其在空战管理员加速训练中的应用导论》最新126页

军事通信市场七大趋势概述

《抗干扰无人机蜂群行为的遗传算法方法》

相关资讯

已删除

inpluslab

8+阅读 · 2019年10月29日

VALSE Webinar 19-22期医学影像处理与分析

VALSE Webinar 19-22期医学影像处理与分析

VALSE

9+阅读 · 2019年8月30日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

博客 | Github开源人体姿态识别项目OpenPose中文文档

博客 | Github开源人体姿态识别项目OpenPose中文文档

AI研习社

13+阅读 · 2018年11月21日

【语音识别】从入门到精通——最全干货大合集！

【语音识别】从入门到精通——最全干货大合集！

专知

20+阅读 · 2018年11月5日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

相关论文

Speech BERT Embedding For Improving Prosody in Neural TTS

Arxiv

0+阅读 · 2021年6月15日

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

Arxiv

0+阅读 · 2021年6月13日

Estimating articulatory movements in speech production with transformer networks

Arxiv

0+阅读 · 2021年6月12日

Real-Time Global Illumination Decomposition of Videos

Arxiv

0+阅读 · 2021年6月10日

Brain Age Estimation From MRI Using Cascade Networks with Ranking Loss

Arxiv

0+阅读 · 2021年6月6日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Sparse2Dense: From direct sparse odometry to dense 3D reconstruction

Sparse2Dense: From direct sparse odometry to dense 3D reconstruction

Arxiv

9+阅读 · 2019年3月21日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

微信扫码咨询专知VIP会员