Karaoker:无协调的歌声合成,配有语言培训数据 (Karaoker: Alignment-free singing voice synthesis with speech training data) - 专知论文

会员服务 ·

0

MoDELS · Wasserstein生成对抗网络 · 训练数据 · Continuity · GaN ·

2022 年 8 月 31 日

Karaoker: Alignment-free singing voice synthesis with speech training data

翻译：Karaoker:无协调的歌声合成,配有语言培训数据

Panos Kakoulidis,Nikolaos Ellinas,Georgios Vamvoukakis,Konstantinos Markopoulos,June Sig Sung,Gunu Jho,Pirros Tsiakoulis,Aimilios Chalamandaris

from arxiv, Accepted to INTERSPEECH 2022

Existing singing voice synthesis models (SVS) are usually trained on singing data and depend on either error-prone time-alignment and duration features or explicit music score information. In this paper, we propose Karaoker, a multispeaker Tacotron-based model conditioned on voice characteristic features that is trained exclusively on spoken data without requiring time-alignments. Karaoker synthesizes singing voice and transfers style following a multi-dimensional template extracted from a source waveform of an unseen singer/speaker. The model is jointly conditioned with a single deep convolutional encoder on continuous data including pitch, intensity, harmonicity, formants, cepstral peak prominence and octaves. We extend the text-to-speech training objective with feature reconstruction, classification and speaker identification tasks that guide the model to an accurate result. In addition to multitasking, we also employ a Wasserstein GAN training scheme as well as new losses on the acoustic model's output to further refine the quality of the model.

翻译：现有的歌声合成模型(SVS)通常在歌唱数据方面接受培训,并且取决于容易出错的时间调整和持续时间特点,或者明确的音乐评分信息。在本文中,我们提议卡拉奥克(Karaoker),这是一个以声音特征为条件的多方言的塔科坦(Tacocron)模型,专门以口述数据为条件的培训,而不需要时间比对。卡拉奥克(Karaoker)根据从一个看不见歌手/讲演者的源波形中提取的多维的模版合成了歌声和传音风格。该模型与一个单一的深层共变相编码器一起,以连续数据为条件,包括声频、强度、调力、成形体、中枢峰突出和八角等。我们把文本到语音的培训目标扩展为特征重建、分类和语音识别任务,引导模型取得准确的结果。除了多功能外,我们还采用瓦瑟斯坦GAN培训计划,以及声学模型产出的新损失,以进一步改进模型的质量。

0

相关内容

MoDELS

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

非小细胞肺癌患者血浆可溶性TRAIL对循环ALDH1+肿瘤干细胞样细胞的影响及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

CK2诱导的PTEN蛋白Ser380/Thr382/383位点磷酸化激活PI3K/Akt信号通路在胃癌形成、生长及转移中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于协方差理论的UCT动态关联算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

Stat3抑制myocardin诱导心肌肥厚的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

mTOR激活对吗啡耐受的调控及其分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于半监督结构化学习的跨语言映射研究

国家自然科学基金

2+阅读 · 2011年12月31日

PKD3的激活及核定位在前列腺癌侵袭和转移中的作用及机制

国家自然科学基金

0+阅读 · 2009年12月31日

Pincer型环金属化合物小分子凝胶剂的合成及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

组合导航系统中基于混沌、小波和神经网络的信息融合方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Provably Robust Detection of Out-of-distribution Data (almost) for free

Arxiv

0+阅读 · 2022年10月18日

Hierarchical Diffusion Models for Singing Voice Neural Vocoder

Arxiv

0+阅读 · 2022年10月18日

3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation

Arxiv

0+阅读 · 2022年10月17日

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Arxiv

0+阅读 · 2022年10月15日

Trailers12k: Evaluating Transfer Learning for Movie Trailer Genre Classification

Arxiv

0+阅读 · 2022年10月14日

TransFusion: Transcribing Speech with Multinomial Diffusion

Arxiv

0+阅读 · 2022年10月14日

An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition

Arxiv

0+阅读 · 2022年10月13日

WeSinger 2: Fully Parallel Singing Voice Synthesis via Multi-Singer Conditional Adversarial Training

Arxiv

0+阅读 · 2022年10月11日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

Bridging the Gap Between Spectral and Spatial Domains in Graph Neural Networks

Bridging the Gap Between Spectral and Spatial Domains in Graph Neural Networks

Arxiv

15+阅读 · 2020年3月26日

VIP会员

文章信息

相关主题

Wasserstein生成对抗网络

相关VIP内容

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

相关论文

Provably Robust Detection of Out-of-distribution Data (almost) for free

Arxiv

0+阅读 · 2022年10月18日

Hierarchical Diffusion Models for Singing Voice Neural Vocoder

Arxiv

0+阅读 · 2022年10月18日

3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation

Arxiv

0+阅读 · 2022年10月17日

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Arxiv

0+阅读 · 2022年10月15日

Trailers12k: Evaluating Transfer Learning for Movie Trailer Genre Classification

Arxiv

0+阅读 · 2022年10月14日

TransFusion: Transcribing Speech with Multinomial Diffusion

Arxiv

0+阅读 · 2022年10月14日

An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition

Arxiv

0+阅读 · 2022年10月13日

WeSinger 2: Fully Parallel Singing Voice Synthesis via Multi-Singer Conditional Adversarial Training

Arxiv

0+阅读 · 2022年10月11日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

Bridging the Gap Between Spectral and Spatial Domains in Graph Neural Networks

Bridging the Gap Between Spectral and Spatial Domains in Graph Neural Networks

Arxiv

15+阅读 · 2020年3月26日

相关基金

非小细胞肺癌患者血浆可溶性TRAIL对循环ALDH1+肿瘤干细胞样细胞的影响及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

CK2诱导的PTEN蛋白Ser380/Thr382/383位点磷酸化激活PI3K/Akt信号通路在胃癌形成、生长及转移中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于协方差理论的UCT动态关联算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

Stat3抑制myocardin诱导心肌肥厚的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

mTOR激活对吗啡耐受的调控及其分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于半监督结构化学习的跨语言映射研究

国家自然科学基金

2+阅读 · 2011年12月31日

PKD3的激活及核定位在前列腺癌侵袭和转移中的作用及机制

国家自然科学基金

0+阅读 · 2009年12月31日

Pincer型环金属化合物小分子凝胶剂的合成及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

组合导航系统中基于混沌、小波和神经网络的信息融合方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员