Contrastive learning constitutes an emerging branch of self-supervised learning that leverages large amounts of unlabeled data, by learning a latent space, where pairs of different views of the same sample are associated. In this paper, we propose musical source association as a pair generation strategy in the context of contrastive music representation learning. To this end, we modify COLA, a widely used contrastive learning audio framework, to learn to associate a song excerpt with a stochastically selected and automatically extracted vocal or instrumental source. We further introduce a novel modification to the contrastive loss to incorporate information about the existence or absence of specific sources. Our experimental evaluation in three different downstream tasks (music auto-tagging, instrument classification and music genre classification) using the publicly available Magna-Tag-A-Tune (MTAT) as a source dataset yields competitive results to existing literature methods, as well as faster network convergence. The results also show that this pre-training method can be steered towards specific features, according to the selected musical source, while also being dependent on the quality of the separated sources.
翻译:反向学习是自我监督学习的一个新兴分支,它通过学习潜在的空间,利用大量未贴标签的数据,从而利用大量未贴标签的数据。在本文件中,我们建议音乐源协会作为对比式音乐代表学习背景下的一对一代战略。为此,我们修改广泛使用的对比式学习音频框架COLA, 以学习将歌曲节选和自动提取的声乐或工具源联系起来。我们进一步对对比性损失作了新的修改,以纳入关于具体来源是否存在的信息。我们在三种不同的下游任务(音乐自动粘贴、仪器分类和音乐类型分类)中的实验性评价,使用公开可用的Magna-Tag-A-Tune(MTAT)作为源数据集,为现有文学方法带来竞争性结果,并加快网络融合速度。结果还表明,根据选定的音乐来源,这种培训前方法可以转向特定特征,同时取决于分离源的质量。