使用自我监督的预培训模型和矢量量化增强发言能力 (Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization) - 专知论文

会员服务 ·

0

瓶颈层 · MoDELS · 语音增强 · 离散化 · 层 ·

2022 年 9 月 28 日

Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization

翻译：使用自我监督的预培训模型和矢量量化增强发言能力

Xiao-Ying Zhao,Qiu-Shi Zhu,Jie Zhang

from arxiv, Accepted to APSIPA ASC 2022

With the development of deep learning, neural network-based speech enhancement (SE) models have shown excellent performance. Meanwhile, it was shown that the development of self-supervised pre-trained models can be applied to various downstream tasks. In this paper, we will consider the application of the pre-trained model to the real-time SE problem. Specifically, the encoder and bottleneck layer of the DEMUCS model are initialized using the self-supervised pretrained WavLM model, the convolution in the encoder is replaced by causal convolution, and the transformer encoder in the bottleneck layer is based on causal attention mask. In addition, as discretizing the noisy speech representations is more beneficial for denoising, we utilize a quantization module to discretize the representation output from the bottleneck layer, which is then fed into the decoder to reconstruct the clean speech waveform. Experimental results on the Valentini dataset and an internal dataset show that the pre-trained model based initialization can improve the SE performance and the discretization operation suppresses the noise component in the representations to some extent, which can further improve the performance.

翻译：随着深层学习的开发,神经网络增强语音模型(SE)的开发表现良好,同时,显示开发自我监督的预先培训模型可以适用于各种下游任务。在本文中,我们将考虑将预培训模型应用于实时SE问题。具体地说,DEMUSCS模型的编码器和瓶颈层使用自监督的预先培训WavLM模型进行初始化,编码器中的混凝土被因果堆合所取代,瓶颈层的变压器以因果调控掩罩为基础。此外,由于隔离式语音显示器更有利于分解,我们将使用一个四分化模块将瓶层的表示输出分解,然后输入解码器以重建干净的语音波形。关于Valentini数据集的实验结果和内部数据集显示,预先培训模型的初始化可以改进SE的性能,而离式操作将摄取的噪音部分压缩到一定程度,从而可以进一步改进性能。

0

相关内容

瓶颈层

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

MALAT1/miR-204/Smad4信号通路介导间质细胞成骨分化诱导主动脉瓣钙化机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

自噬在血管紧张素II诱导的动脉粥样硬化中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

离子取代与晶格调控对YAG:Ce3+荧光粉发光性质的影响

国家自然科学基金

0+阅读 · 2014年12月31日

高效GaN基绿光LED研究

国家自然科学基金

0+阅读 · 2013年12月31日

脊髓细胞特异性miRNAs调控损伤运动神经元凋亡的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

Rossby波产生纬向流的动力学机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

承压破碎岩体蠕变-渗流系统非线性动力学特性研究

国家自然科学基金

0+阅读 · 2009年12月31日

折叠式人工玻璃体缓释PKCα25233;制剂防治PVR的研究

国家自然科学基金

0+阅读 · 2009年12月31日

壳寡糖抑制肿瘤血管生成的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

镉激活神经细胞mTOR通路诱导凋亡及雷帕霉素靶向调控抗凋亡分子机理

国家自然科学基金

0+阅读 · 2009年12月31日

Phoneme Segmentation Using Self-Supervised Speech Models

Arxiv

0+阅读 · 2022年11月2日

Self-supervised learning for robust voice cloning

Arxiv

0+阅读 · 2022年11月2日

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations

Arxiv

0+阅读 · 2022年11月2日

Inference and Denoise: Causal Inference-based Neural Speech Enhancement

Inference and Denoise: Causal Inference-based Neural Speech Enhancement

Arxiv

0+阅读 · 2022年11月2日

Weighted variance variational autoencoder for speech enhancement

Arxiv

0+阅读 · 2022年11月2日

Audio-visual speech enhancement with a deep Kalman filter generative model

Arxiv

0+阅读 · 2022年11月2日

Self-Supervised RF Signal Representation Learning for NextG Signal Classification with Deep Learning

Arxiv

0+阅读 · 2022年11月1日

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Arxiv

0+阅读 · 2022年11月1日

Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation

Arxiv

0+阅读 · 2022年10月31日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

VIP会员

文章信息

相关主题

相关VIP内容

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【AAAI2026】NeSTR：一种用于大型语言模型的神经-符号可溯因框架，用于时间推理

深度强化学习与模仿学习导论

智能体适应

【博士论文】面向开放式世界的鲁棒智能体

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Phoneme Segmentation Using Self-Supervised Speech Models

Arxiv

0+阅读 · 2022年11月2日

Self-supervised learning for robust voice cloning

Arxiv

0+阅读 · 2022年11月2日

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations

Arxiv

0+阅读 · 2022年11月2日

Inference and Denoise: Causal Inference-based Neural Speech Enhancement

Inference and Denoise: Causal Inference-based Neural Speech Enhancement

Arxiv

0+阅读 · 2022年11月2日

Weighted variance variational autoencoder for speech enhancement

Arxiv

0+阅读 · 2022年11月2日

Audio-visual speech enhancement with a deep Kalman filter generative model

Arxiv

0+阅读 · 2022年11月2日

Self-Supervised RF Signal Representation Learning for NextG Signal Classification with Deep Learning

Arxiv

0+阅读 · 2022年11月1日

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Arxiv

0+阅读 · 2022年11月1日

Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation

Arxiv

0+阅读 · 2022年10月31日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

相关基金

MALAT1/miR-204/Smad4信号通路介导间质细胞成骨分化诱导主动脉瓣钙化机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

自噬在血管紧张素II诱导的动脉粥样硬化中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

离子取代与晶格调控对YAG:Ce3+荧光粉发光性质的影响

国家自然科学基金

0+阅读 · 2014年12月31日

高效GaN基绿光LED研究

国家自然科学基金

0+阅读 · 2013年12月31日

脊髓细胞特异性miRNAs调控损伤运动神经元凋亡的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

Rossby波产生纬向流的动力学机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

承压破碎岩体蠕变-渗流系统非线性动力学特性研究

国家自然科学基金

0+阅读 · 2009年12月31日

折叠式人工玻璃体缓释PKCα25233;制剂防治PVR的研究

国家自然科学基金

0+阅读 · 2009年12月31日

壳寡糖抑制肿瘤血管生成的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

镉激活神经细胞mTOR通路诱导凋亡及雷帕霉素靶向调控抗凋亡分子机理

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员