音乐混合风格转移：对比学习方法用于解离音频效果 (Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects)

We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song. This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a reference music recording. All our models are trained in a self-supervised manner from an already-processed wet multitrack dataset with an effective data preprocessing method that alleviates the data scarcity of obtaining unprocessed dry data. We analyze the proposed encoder for the disentanglement capability of audio effects and also validate its performance for mixing style transfer through both objective and subjective evaluations. From the results, we show the proposed system not only converts the mixing style of multitrack audio close to a reference but is also robust with mixture-wise style transfer upon using a music source separation model.

翻译：本文提出了一种端到端的音乐混合风格转移系统，可以将输入的多轨混音转换为参考曲目的混合风格。该系统利用预训练的编码器实现，采用对比目标来提取与参考音乐录音相关的音频效果信息。我们使用有效的数据预处理方法，以自监督训练的方式从已处理的濕混音数据集中训练所有模型，缓解了获取未处理干混音数据的数据不足问题。我们分析了所提出的编码器用于音频效果的解离能力，并通过客观和主观评估验证了其混合风格转移的性能。结果表明，所提出的系统不仅可以将多轨音频的混合风格转换为接近参考值，而且在使用音乐源分离模型时，可以实现混合物分别的混合风格转移，并表现出较强的鲁棒性。