Despite significant advances in deep models for music generation, the use of these techniques remains restricted to expert users. Before being democratized among musicians, generative models must first provide expressive control over the generation, as this conditions the integration of deep generative models in creative workflows. In this paper, we tackle this issue by introducing a deep generative audio model providing expressive and continuous descriptor-based control, while remaining lightweight enough to be embedded in a hardware synthesizer. We enforce the controllability of real-time generation by explicitly removing salient musical features in the latent space using an adversarial confusion criterion. User-specified features are then reintroduced as additional conditioning information, allowing for continuous control of the generation, akin to a synthesizer knob. We assess the performance of our method on a wide variety of sounds including instrumental, percussive and speech recordings while providing both timbre and attributes transfer, allowing new ways of generating sounds.
翻译:尽管在音乐创作的深层模型方面有了重大进步,但这些技术的使用仍然仅限于专家用户。在音乐家民主化之前,基因模型必须首先对下一代进行直观控制,因为这是将深层基因模型融入创造性工作流程的条件。在本文件中,我们通过引入一个深厚的基因模型,提供直观和连续的描述式控制,同时仍然保持足够轻度,足以嵌入硬件合成器。我们通过使用对抗性混乱标准明确去除潜藏空间中突出的音乐特征,强制执行实时生成的可控性。然后,用户指定特征被重新引入为附加的调节信息,允许持续控制下一代,类似于合成人 knob。我们评估我们方法在包括工具、感应和语音录音在内的各种声音上的性能,同时提供触觉和属性传输,允许以新的方式生成声音。</s>