The common research goal of self-supervised learning is to extract a general representation which an arbitrary downstream task would benefit from. In this work, we investigate music audio representation learned from different contrastive self-supervised learning schemes and empirically evaluate the embedded vectors on various music information retrieval (MIR) tasks where different levels of the music perception are concerned. We analyze the results to discuss the proper direction of contrastive learning strategies for different MIR tasks. We show that these representations convey a comprehensive information about the auditory characteristics of music in general, although each of the self-supervision strategies has its own effectiveness in certain aspect of information.
翻译:自我监督学习的共同研究目标是获得普遍代表性,而任意的下游任务将从中受益。在这项工作中,我们调查从不同截然不同的自我监督学习计划中学习的音乐音频代表,并对各种音乐信息检索(MIR)任务中与不同程度的音乐感知有关的嵌入矢量进行实证评估。我们分析结果,讨论不同音乐感知任务对比学习战略的适当方向。我们表明,这些表述传达了关于一般音乐听觉特点的全面信息,尽管每个自监督战略在某些信息方面都有其自身的效力。