While supervised learning has enabled great advances in many areas of music, labeled music datasets remain especially hard, expensive and time-consuming to create. In this work, we introduce SimCLR to the music domain and contribute a large chain of audio data augmentations, to form a simple framework for self-supervised learning of raw waveforms of music: CLMR. This approach requires no manual labeling and no preprocessing of music to learn useful representations. We evaluate CLMR in the downstream task of music classification on the MagnaTagATune and Million Song datasets. A linear classifier fine-tuned on representations from a pre-trained CLMR model achieves an average precision of 35.4% on the MagnaTagATune dataset, superseding fully supervised models that currently achieve a score of 34.9%. Moreover, we show that CLMR's representations are transferable using out-of-domain datasets, indicating that they capture important musical knowledge. Lastly, we show that self-supervised pre-training allows us to learn efficiently on smaller labeled datasets: we still achieve a score of 33.1% despite using only 259 labeled songs during fine-tuning. To foster reproducibility and future research on self-supervised learning in music, we publicly release the pre-trained models and the source code of all experiments of this paper on GitHub.
翻译:虽然监督的学习使音乐在许多领域取得了巨大进步,但标签的音乐数据集仍然特别困难、昂贵和耗时,难以创造。在这项工作中,我们向音乐领域介绍SimCLR(SimCLR),并为音乐领域提供大量的音频扩增链,形成一个简单框架,以便自监督地学习音乐的原始波形:CLMR(CLMR)。这个方法不需要人工标签和音乐预处理来学习有用的表达方式。我们在MagnaTagatune和百万宋数据集的音乐分类下游任务中评价CLMR(CLM)。一个对预先训练的CLMR模型的演示进行精细微调整的线性分类,在MagnaTagatune数据集中达到平均35.4%的精准度,对目前达到34.9%分的完全受监督的模型进行自我监督的学习。此外,我们显示CLMR(CRMR)的表述方式是可转让的,表明它们能捕捉到重要的音乐知识。最后,我们通过自我监督的训练前的训练可以有效地学习小标记数据集:我们仍然在Mana Tagrealex(Silva)上学习了331%的实验前的模型。