Sound synthesis is a complex field that requires domain expertise. Manual tuning of synthesizer parameters to match a specific sound can be an exhaustive task, even for experienced sound engineers. In this paper, we propose an automatic method for synthesizer parameters tuning to match a given input sound. The method is based on strided Convolutional Neural Networks and is capable of inferring the synthesizer parameters configuration from the input spectrogram and even from the raw audio. The effectiveness of our method is demonstrated on a subtractive synthesizer with four frequency modulated oscillators, envelope generator and a gater effect. We present extensive quantitative and qualitative results that showcase the superiority of our model over several baselines. Furthermore, we show that the network depth is an important factor that contributes to the prediction accuracy.
翻译:声学合成是一个复杂的领域,需要领域的专门知识。人工调整合成参数,使之与特定声音相匹配,可以是详尽无遗的任务,即使是经验丰富的音响工程师也是如此。在本文中,我们提出合成参数自动调整方法,以匹配特定输入声音。该方法基于分层进化神经网络,能够从输入光谱甚至原始音频中推断合成参数配置。我们的方法的有效性通过一个减色合成器展示,该合成器有四个频率调制振荡器、信封生成器和一个门机效应。我们展示了广泛的定量和定性结果,展示了模型优于几个基线。此外,我们展示了网络深度是有助于预测准确性的重要因素。