借助深具歧视性的演讲方编码器,提高一拍声音转换的稳健性 (Improving robustness of one-shot voice conversion with deep discriminative speaker encoder) - 专知论文

会员服务 ·

0

INFORMS · 判别器 · 稳健性 · Weight · 汇聚层 ·

2021 年 6 月 19 日

Improving robustness of one-shot voice conversion with deep discriminative speaker encoder

翻译：借助深具歧视性的演讲方编码器,提高一拍声音转换的稳健性

Hongqiang Du,Lei Xie

One-shot voice conversion has received significant attention since only one utterance from source speaker and target speaker respectively is required. Moreover, source speaker and target speaker do not need to be seen during training. However, available one-shot voice conversion approaches are not stable for unseen speakers as the speaker embedding extracted from one utterance of an unseen speaker is not reliable. In this paper, we propose a deep discriminative speaker encoder to extract speaker embedding from one utterance more effectively. Specifically, the speaker encoder first integrates residual network and squeeze-and-excitation network to extract discriminative speaker information in frame level by modeling frame-wise and channel-wise interdependence in features. Then attention mechanism is introduced to further emphasize speaker related information via assigning different weights to frame level speaker information. Finally a statistic pooling layer is used to aggregate weighted frame level speaker information to form utterance level speaker embedding. The experimental results demonstrate that our proposed speaker encoder can improve the robustness of one-shot voice conversion for unseen speakers and outperforms baseline systems in terms of speech quality and speaker similarity.

翻译：由于只要求源演讲人和目标演讲人分别讲出一个话语,一发声音转换工作受到极大关注,因为需要分别从源演讲人和目标演讲人分别讲出一个话语,而且不需要在培训期间看到源演讲人和目标演讲人,但是,由于从一个隐蔽发言者的一句话中摘出的声音嵌入一个声音,对隐蔽的发言者来说,可用的一发声音转换方法并不稳定,因为从一个隐蔽发言者的一句话中摘出的声音转换方法并不可靠。在本文中,我们建议用一个深具歧视性的演讲人编码器,以便更有效地从一个话语句中提取发言者嵌入的声音。具体地说,发言者编码器首先将残余的网络和挤压和刺激网络整合到框架一级,通过在功能上建模框架框架和频道上的相互依存来提取歧视性的演讲人信息。然后引入一个注意机制,通过给演讲人的信息分配不同的权重来进一步强调与演讲人有关的信息。最后,将一个统计集中层用于汇总加权的基底层次的演讲人信息,形成一个发音层的发言人嵌入。实验结果表明,我们提议的演讲人的编码器可以改进对看不见发言者的一发音转换的强度,在发言质量和相似性方面超越基线系统的系统。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

ICML 2021论文收录

ICML 2021论文收录

专知会员服务

123+阅读 · 2021年5月8日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【CVPR 2020 Oral】小样本类增量学习

专知会员服务

112+阅读 · 2020年6月26日

【CVPR2020-清华大学】渐进对抗网络的细粒度域适应，Progressive Adversarial Networks

专知会员服务

27+阅读 · 2020年4月4日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

专知会员服务

26+阅读 · 2019年12月25日

【AAAI2020论文】小样本网络压缩，Few Shot Network Compression via Cross Distillation (附pdf）

专知会员服务

26+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

Distributed Attention for Grounded Image Captioning

Arxiv

0+阅读 · 2021年8月22日

Discriminative Region-based Multi-Label Zero-Shot Learning

Arxiv

0+阅读 · 2021年8月20日

Adversarially learning disentangled speech representations for robust multi-factor voice conversion

Arxiv

0+阅读 · 2021年8月20日

Discriminative Domain-Invariant Adversarial Network for Deep Domain Generalization

Arxiv

0+阅读 · 2021年8月20日

Coordinate Attention for Efficient Mobile Network Design

Arxiv

9+阅读 · 2021年3月4日

Kernel Based Progressive Distillation for Adder Neural Networks

Arxiv

5+阅读 · 2020年9月29日

Learning Discriminative Motion Features Through Detection

Learning Discriminative Motion Features Through Detection

Arxiv

3+阅读 · 2018年12月11日

Sample Efficient Adaptive Text-to-Speech

Arxiv

7+阅读 · 2018年9月27日

Learning Semantic Sentence Embeddings using Pair-wise Discriminator

Arxiv

6+阅读 · 2018年6月15日

Scalable Angular Discriminative Deep Metric Learning for Face Recognition

Arxiv

4+阅读 · 2018年5月1日

VIP会员

文章信息

相关主题

相关VIP内容

ICML 2021论文收录

ICML 2021论文收录

专知会员服务

123+阅读 · 2021年5月8日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【CVPR 2020 Oral】小样本类增量学习

专知会员服务

112+阅读 · 2020年6月26日

【CVPR2020-清华大学】渐进对抗网络的细粒度域适应，Progressive Adversarial Networks

专知会员服务

27+阅读 · 2020年4月4日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

专知会员服务

26+阅读 · 2019年12月25日

【AAAI2020论文】小样本网络压缩，Few Shot Network Compression via Cross Distillation (附pdf）

专知会员服务

26+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Distributed Attention for Grounded Image Captioning

Arxiv

0+阅读 · 2021年8月22日

Discriminative Region-based Multi-Label Zero-Shot Learning

Arxiv

0+阅读 · 2021年8月20日

Adversarially learning disentangled speech representations for robust multi-factor voice conversion

Arxiv

0+阅读 · 2021年8月20日

Discriminative Domain-Invariant Adversarial Network for Deep Domain Generalization

Arxiv

0+阅读 · 2021年8月20日

Coordinate Attention for Efficient Mobile Network Design

Arxiv

9+阅读 · 2021年3月4日

Kernel Based Progressive Distillation for Adder Neural Networks

Arxiv

5+阅读 · 2020年9月29日

Learning Discriminative Motion Features Through Detection

Learning Discriminative Motion Features Through Detection

Arxiv

3+阅读 · 2018年12月11日

Sample Efficient Adaptive Text-to-Speech

Arxiv

7+阅读 · 2018年9月27日

Learning Semantic Sentence Embeddings using Pair-wise Discriminator

Arxiv

6+阅读 · 2018年6月15日

Scalable Angular Discriminative Deep Metric Learning for Face Recognition

Arxiv

4+阅读 · 2018年5月1日

微信扫码咨询专知VIP会员