Most music emotion recognition approaches perform classification or regression that estimates a general emotional category from a distribution of music samples, but without considering emotional variations (e.g., happiness can be further categorised into much, moderate or little happiness). We propose an embedding-based music emotion recognition approach that associates music samples with emotions in a common embedding space by considering both general emotional categories and fine-grained discrimination within each category. Since the association of music samples with emotions is uncertain due to subjective human perceptions, we compute composite loss-based embeddings obtained to maximise two statistical characteristics, one being the correlation between music samples and emotions based on canonical correlation analysis, and the other being a probabilistic similarity between a music sample and an emotion with KL-divergence. The experiments on two benchmark datasets demonstrate the effectiveness of our embedding-based approach, the composite loss and learned acoustic features. In addition, detailed analysis shows that our approach can accomplish robust bidirectional music emotion recognition that not only identifies music samples matching with a specific emotion but also detects emotions expressed in a certain music sample.
翻译:大多数音乐情感识别的方法都是通过分类或回归来估计音乐样本的一般情感类别,但是并没有考虑情感的变化(例如,快乐可以进一步被划分为高度,适度或稍微的快乐)。我们提出了一种基于嵌入式的音乐情感识别方法,通过在一个共同的嵌入空间中将音乐样本与情感相关联,既考虑一般情感类别,又考虑每个类别内的细粒度区分。由于音乐样本与情感的关联由于主观人类感知而不确定,因此我们计算基于复合损失的嵌入,该嵌入通过最大化两个统计特征而获得,一个是基于典范相关分析的音乐样本和情感之间的相关性,另一个是基于KL距离的音乐样本和情感之间的概率相似性。在两个基准数据集上的实验证明了我们的嵌入式方法、复合损失和学习的声学特征的有效性。此外,详细的分析表明我们的方法可以实现稳健的双向音乐情感识别,不仅可以识别与特定情感相匹配的音乐样本,还可以检测音乐样本中表达的情感。