Synthesizer is a type of electronic musical instrument that is now widely used in modern music production and sound design. Each parameters configuration of a synthesizer produces a unique timbre and can be viewed as a unique instrument. The problem of estimating a set of parameters configuration that best restore a sound timbre is an important yet complicated problem, i.e.: the synthesizer parameters estimation problem. We proposed a multi-modal deep-learning-based pipeline Sound2Synth, together with a network structure Prime-Dilated Convolution (PDC) specially designed to solve this problem. Our method achieved not only SOTA but also the first real-world applicable results on Dexed synthesizer, a popular FM synthesizer.
翻译:合成器是一种电子乐器,目前广泛用于现代音乐制作和音响设计。合成器的每种参数配置都产生一个独特的小字节,可被视为一种独特的工具。估计一套最能恢复一个声音小字节的参数配置是一个重要而又复杂的问题,即:合成器参数估计问题。我们提出了一种基于多种模式的深层学习管道Sound2Synth,以及一个专门为解决这一问题而设计的网络结构“先发制人革命 ” ( PCDC) 。我们的方法不仅实现了SOTA,而且还取得了对Dexed合成器(流行调频合成器)的第一个真实世界适用的结果。