Muskits: 唱声合成乐乐最后到最后音乐处理工具包 (Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis) - 专知论文

会员服务 ·

0

Processing（编程语言） · 端到端 · 迁移学习 · 知识 (knowledge) · state-of-the-art ·

2022 年 5 月 9 日

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis

翻译：Muskits: 唱声合成乐乐最后到最后音乐处理工具包

Jiatong Shi,Shuai Guo,Tao Qian,Nan Huo,Tomoki Hayashi,Yuning Wu,Frank Xu,Xuankai Chang,Huazhe Li,Peter Wu,Shinji Watanabe,Qin Jin

from arxiv, Interspeech submission

This paper introduces a new open-source platform named Muskits for end-to-end music processing, which mainly focuses on end-to-end singing voice synthesis (E2E-SVS). Muskits supports state-of-the-art SVS models, including RNN SVS, transformer SVS, and XiaoiceSing. The design of Muskits follows the style of widely-used speech processing toolkits, ESPnet and Kaldi, for data prepossessing, training, and recipe pipelines. To the best of our knowledge, this toolkit is the first platform that allows a fair and highly-reproducible comparison between several published works in SVS. In addition, we also demonstrate several advanced usages based on the toolkit functionalities, including multilingual training and transfer learning. This paper describes the major framework of Muskits, its functionalities, and experimental results in single-singer, multi-singer, multilingual, and transfer learning scenarios. The toolkit is publicly available at https://github.com/SJTMusicTeam/Muskits.

翻译：本文介绍一个新的开放源码平台,名为Muskits,用于终端到终端音乐处理,主要侧重于终端到终端的歌声合成(E2E-SVS),Muskits支持最新的SVS模型,包括RNNSVS、变压器SVS和小菊Sing。Muskits的设计采用广泛使用的语音处理工具包、ESPnet和Kaldi的风格,用于数据预存、培训和配方管道。据我们所知,该工具包是第一个能够对SVS中一些出版的作品进行公平和高度可复制的比较的平台。此外,我们还根据工具包的功能展示了几种先进的用途,包括多语种培训和转让学习。本文描述了Muskits的主要框架、其功能以及单星、多星、多语和传输学习情景的实验结果。该工具包可在https://github.com/SJTMyleningTeam/Muskits公开查阅。

0

相关内容

Processing（编程语言）

Processing（编程语言）

Processing 是一门开源编程语言和与之配套的集成开发环境（IDE）的名称。Processing 在电子艺术和视觉设计社区被用来教授编程基础，并运用于大量的新媒体和互动艺术作品中。

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

脆性X综合症模型小鼠雌激素ER-β调节突触可塑性异常的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

地下水中痕量卤素形态分析研究

国家自然科学基金

0+阅读 · 2012年12月31日

纳米粒子毛细管电泳/电色谱技术应用于元素形态分析

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

复杂地形条件下固体物料的长距离管道输送机理及应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion

Arxiv

0+阅读 · 2022年6月28日

KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke

Arxiv

0+阅读 · 2022年6月27日

WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses

Arxiv

0+阅读 · 2022年6月25日

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation

Arxiv

0+阅读 · 2022年6月24日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

VIP会员

文章信息

相关主题

Processing（编程语言）

知识 (knowledge)

state-of-the-art

相关VIP内容

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于AI的动态任务分配策略实现多智能体系统有意义人类控制》报告

《超越连接：AI驱动网络未来愿景》最新报告

人工智能赋能多域作战：能力与挑战

《战场空间决策优势：AI基础与应用研究》总结报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

相关论文

A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion

Arxiv

0+阅读 · 2022年6月28日

KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke

Arxiv

0+阅读 · 2022年6月27日

WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses

Arxiv

0+阅读 · 2022年6月25日

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation

Arxiv

0+阅读 · 2022年6月24日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

相关基金

脆性X综合症模型小鼠雌激素ER-β调节突触可塑性异常的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

地下水中痕量卤素形态分析研究

国家自然科学基金

0+阅读 · 2012年12月31日

纳米粒子毛细管电泳/电色谱技术应用于元素形态分析

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

复杂地形条件下固体物料的长距离管道输送机理及应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员