Sound synthesis is a complex field that requires domain expertise. Manual tuning of synthesizer parameters to match a specific sound can be an exhaustive task, even for experienced sound engineers. In this paper, we introduce InverSynth - an automatic method for synthesizer parameters tuning to match a given input sound. InverSynth is based on strided convolutional neural networks and is capable of inferring the synthesizer parameters configuration from the input spectrogram and even from the raw audio. The effectiveness InverSynth is demonstrated on a subtractive synthesizer with four frequency modulated oscillators, envelope generator and a gater effect. We present extensive quantitative and qualitative results that showcase the superiority InverSynth over several baselines. Furthermore, we show that the network depth is an important factor that contributes to the prediction accuracy.
翻译:声音合成是一个复杂的领域,需要域域专长。 合成参数的手工调整与特定声音相匹配可以是详尽无遗的任务, 即使是有经验的声音工程师也是如此。 在本文中, 我们引入了 InverSynth -- -- InverSynth -- -- 合成参数的自动调整方法, 与给定的输入声音相匹配。 InverSynth 是基于四重频率调制振动器、 信封生成器和导门效应的减色合成器, 显示在多个基线之上的优越性。 此外, 我们显示网络深度是有助于预测准确性的重要因素。