Music signals are difficult to interpret from their low-level features, perhaps even more than images: e.g. highlighting part of a spectrogram or an image is often insufficient to convey high-level ideas that are genuinely relevant to humans. In computer vision, concept learning was therein proposed to adjust explanations to the right abstraction level (e.g. detect clinical concepts from radiographs). These methods have yet to be used for MIR. In this paper, we adapt concept learning to the realm of music, with its particularities. For instance, music concepts are typically non-independent and of mixed nature (e.g. genre, instruments, mood), unlike previous work that assumed disentangled concepts. We propose a method to learn numerous music concepts from audio and then automatically hierarchise them to expose their mutual relationships. We conduct experiments on datasets of playlists from a music streaming service, serving as a few annotated examples for diverse concepts. Evaluations show that the mined hierarchies are aligned with both ground-truth hierarchies of concepts -- when available -- and with proxy sources of concept similarity in the general case.
翻译:音乐信号很难从低层次的特征来解释,也许甚至比图像更难解释:例如,突出光谱或图像的一部分往往不足以传达真正与人类相关的高层次思想。在计算机视觉中,提出了概念学习,以调整解释到正确的抽象层次(例如,从放射中探测临床概念)。这些方法尚未用于MIR。在本文中,我们将概念学习适应于音乐领域及其特殊性。例如,音乐概念通常不独立,而且性质混杂(例如,基因、仪器、情绪),这与先前假定对人类有分歧的概念的工作不同。我们提出了一个方法,从音频中学习许多音乐概念,然后自动将其分级,以暴露其相互关系。我们在音乐流服务中的播放列表数据集上进行实验,作为不同概念的几个附加说明的例子。评价表明,在一般情况下,采矿定级与概念的地面分级(如果有的话)相一致,并且与概念相似性的代用来源相一致。