We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song. This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a reference music recording. All our models are trained in a self-supervised manner from an already-processed wet multitrack dataset with an effective data preprocessing method that alleviates the data scarcity of obtaining unprocessed dry data. We analyze the proposed encoder for the disentanglement capability of audio effects and also validate its performance for mixing style transfer through both objective and subjective evaluations. From the results, we show the proposed system not only converts the mixing style of multitrack audio close to a reference but is also robust with mixture-wise style transfer upon using a music source separation model.
翻译:我们建议一个端到端混合音乐风格传输系统,将输入多曲的混合风格转换为参考歌曲的混合风格。这是通过一个经过初步训练的编码器实现的,其对比性目的只是从参考音乐记录中提取与音效有关的信息。我们的所有模型都是以自我监督的方式,从已经处理过的湿式多轨数据集中进行训练,并采用有效的数据处理前处理方法,减轻获取未处理的干数据的数据稀缺性。我们分析了音效分解能力的拟议编码器,并通过客观和主观评价验证其混合风格传输的性能。我们从结果中显示,拟议的系统不仅将多轨音频的混合风格转换为参考,而且使用音乐源分离模型的混合式传输也很有力。