FM Synthesis is a well-known algorithm used to generate complex timbre from a compact set of design primitives. Typically featuring a MIDI interface, it is usually impractical to control it from an audio source. On the other hand, Differentiable Digital Signal Processing (DDSP) has enabled nuanced audio rendering by Deep Neural Networks (DNNs) that learn to control differentiable synthesis layers from arbitrary sound inputs. The training process involves a corpus of audio for supervision, and spectral reconstruction loss functions. Such functions, while being great to match spectral amplitudes, present a lack of pitch direction which can hinder the joint optimization of the parameters of FM synthesizers. In this paper, we take steps towards enabling continuous control of a well-established FM synthesis architecture from an audio input. Firstly, we discuss a set of design constraints that ease spectral optimization of a differentiable FM synthesizer via a standard reconstruction loss. Next, we present Differentiable DX7 (DDX7), a lightweight architecture for neural FM resynthesis of musical instrument sounds in terms of a compact set of parameters. We train the model on instrument samples extracted from the URMP dataset, and quantitatively demonstrate its comparable audio quality against selected benchmarks.
翻译:调频合成是一种众所周知的算法,用于从一组设计原始物中产生复杂的磁带。通常以一个 MIDI 界面为主,通常不切实际地从音源中控制它。另一方面,可区别的数字信号处理(DDSP)使深神经网络(DNNs)能够进行细微的音频分析,这些网络学会控制来自任意声音输入的不同合成层。培训过程包括一整套用于监督的音频,以及光谱重建损失功能。这些功能虽然非常适合光谱振幅,但缺乏定位方向,可能阻碍调频合成器参数的联合优化。在本文中,我们采取步骤,从音频输入中持续控制一个完善的调频合成结构。首先,我们讨论一套设计限制,通过标准的重建损失来便利对不同调频合成器进行光谱优化。接着,我们介绍了一套可调频调频共振的轻量结构,即音乐仪器神经调频回音的轻量级结构,其参数组合为一组。我们用模型是对照从一个可比较的音频基中提取的数据基准。