While deep learning has enabled great advances in many areas of music, labeled music datasets remain especially hard, expensive, and time-consuming to create. In this work, we introduce SimCLR to the music domain and contribute a large chain of audio data augmentations to form a simple framework for self-supervised, contrastive learning of musical representations: CLMR. This approach works on raw time-domain music data and requires no labels to learn useful representations. We evaluate CLMR in the downstream task of music classification on the MagnaTagATune and Million Song datasets and present an ablation study to test which of our music-related innovations over SimCLR are most effective. A linear classifier trained on the proposed representations achieves a higher average precision than supervised models on the MagnaTagATune dataset, and performs comparably on the Million Song dataset. Moreover, we show that CLMR's representations are transferable using out-of-domain datasets, indicating that our method has strong generalisability in music classification. Lastly, we show that the proposed method allows data-efficient learning on smaller labeled datasets: we achieve an average precision of 33.1% despite using only 259 labeled songs in the MagnaTagATune dataset (1% of the full dataset) during linear evaluation. To foster reproducibility and future research on self-supervised learning in music, we publicly release the pre-trained models and the source code of all experiments of this paper.
翻译:虽然深层次的学习使音乐在许多领域取得了巨大进步,但标签的音乐数据集仍然特别困难、昂贵和耗时地创造。在这项工作中,我们将SimCLR引入音乐领域,并为音乐领域的音乐领域引入SimCLR提供大量的音频数据扩增链,以形成一个简单框架,用于自我监督、对比地学习音乐表现形式:CLMR。这个方法在原始时间段音乐数据上运作,不需要标签来学习有用的表述。我们评估在MagnaTagatune和百万宋数据集的音乐分类下游任务中的CLMR,并提交一份升级研究,以测试我们在SimCLR的音乐相关创新中哪些最为有效。一个接受过培训的线性分类师,其平均精确度高于在MagnaTagatune数据集中监管的模型。此外,我们显示CLMRMR的表述是可转让的,表明我们的方法在音乐分类中具有很强的通用性。最后,我们展示了拟议方法,允许在SMATAAT中以数据中的数据精度的精确度学习,而我们在SMALA33号的精确度上,我们只能在SMALADADA中,我们只标之前的精确度的模型中,我们只能进行了。