In this work, we define a diffusion-based generative model capable of both music synthesis and source separation by learning the score of the joint probability density of sources sharing a context. Alongside the classic total inference tasks (i.e. generating a mixture, separating the sources), we also introduce and experiment on the partial inference task of source imputation, where we generate a subset of the sources given the others (e.g., play a piano track that goes well with the drums). Additionally, we introduce a novel inference method for the separation task. We train our model on Slakh2100, a standard dataset for musical source separation, provide qualitative results in the generation settings, and showcase competitive quantitative results in the separation setting. Our method is the first example of a single model that can handle both generation and separation tasks, thus representing a step toward general audio models.
翻译:在这项工作中,我们定义了一种基于传播的遗传模型,既能进行音乐合成,又能进行源分离,方法是通过学习对共享环境的源的共同概率密度的分数来确定一个基于音乐合成和源分离。除了传统的全部推断任务(即产生混合物,分离来源)之外,我们还引入和实验了源估算的部分推论任务,即产生源估算的子集(如弹奏钢琴曲,与鼓声相匹配)。此外,我们引入了一种新的分离任务推论方法。我们培训了我们的Slakh 2100模型,这是一个用于音乐源分离的标准数据集,在生成环境中提供定性结果,并在分离环境中展示有竞争力的定量结果。我们的方法是一个单一模型的首例,可以同时处理一代和分离任务,从而代表向一般音频模型迈出了一步。