Musicians and audio engineers sculpt and transform their sounds by connecting multiple processors, forming an audio processing graph. However, most deep-learning methods overlook this real-world practice and assume fixed graph settings. To bridge this gap, we develop a system that reconstructs the entire graph from a given reference audio. We first generate a realistic graph-reference pair dataset and train a simple blind estimation system composed of a convolutional reference encoder and a transformer-based graph decoder. We apply our model to singing voice effects and drum mixing estimation tasks. Evaluation results show that our method can reconstruct complex signal routings, including multi-band processing and sidechaining.
翻译:音乐家和音频工程师雕塑,通过连接多个处理器,形成音频处理图,改变声音。然而,大多数深层学习方法都忽略了这个现实世界的做法,并假定固定的图形设置。为了缩小这一差距,我们开发了一个系统,用给定的参考音频来重建整个图形。我们首先产生一个现实的图形参考对数据集,并训练一个简单的盲点估计系统,由进化参考编码器和以变压器为基础的图形解码器组成。我们用我们的模型来唱声效果和鼓混合估算任务。评价结果显示,我们的方法可以重建复杂的信号路由,包括多波段处理和侧链路。</s>