Similar to colorization in computer vision, instrument separation is to assign instrument labels (e.g. piano, guitar...) to notes from unlabeled mixtures which contain only performance information. To address the problem, we adopt diffusion models and explicitly guide them to preserve consistency between mixtures and music. The quantitative results show that our proposed model can generate high-fidelity samples for multitrack symbolic music with creativity.
翻译:与计算机视觉中的色彩化相似,仪器分离是指定仪器标签(例如钢琴、吉他...)从仅包含性能信息的未贴标签混合物中作笔记。为了解决这个问题,我们采用了扩散模型,并明确指导它们保持混合物和音乐的一致性。定量结果显示,我们提议的模型可以产生高不忠的样本,用于具有创造性的多轨象征性音乐。