This paper presents a novel approach to music representation learning. Triplet loss based networks have become popular for representation learning in various multimedia retrieval domains. Yet, one of the most crucial parts of this approach is the appropriate selection of triplets, which is indispensable, considering that the number of possible triplets grows cubically. We present an approach to harness multi-tag annotations for triplet selection, by using Latent Semantic Indexing to project the tags onto a high-dimensional space. From this we estimate tag-relatedness to select hard triplets. The approach is evaluated in a multi-task scenario for which we introduce four large multi-tag annotations for the Million Song Dataset for the music properties genres, styles, moods, and themes.
翻译:本文介绍了一种新型的音乐代表学习方法。 以三联式损失为基础的网络已经为各种多媒体检索领域的代表性学习所流行。 然而,这一方法中最重要的部分之一是适当选择三连制,这是不可或缺的。 考虑到可能的三联制数量会逐立增长, 我们提出了一个方法, 利用三联制的多标签说明来进行三连制选择, 使用远程语拼写索引将标签投射到一个高维空间。 我们从这个角度估算标签相关性, 选择硬三连制。 这种方法在多任务设想中进行了评估, 我们为此引入了用于音乐特性、 风格、 情绪 和主题的 百万宋数据集 的四大多个多组说明 。