Most music emotion recognition approaches use one-way classification or regression that estimates a general emotion from a distribution of music samples, but without considering emotional variations (e.g., happiness can be further categorised into much, moderate or little happiness). We propose a cross-modal music emotion recognition approach that associates music samples with emotions in a common space by considering both of their general and specific characteristics. Since the association of music samples with emotions is uncertain due to subjective human perceptions, we compute composite loss-based embeddings obtained to maximise two statistical characteristics, one being the correlation between music samples and emotions based on canonical correlation analysis, and the other being a probabilistic similarity between a music sample and an emotion with KL-divergence. Experiments on two benchmark datasets demonstrate the superiority of our approach over one-way baselines. In addition, detailed analysis show that our approach can accomplish robust cross-modal music emotion recognition that not only identifies music samples matching with a specific emotion but also detects emotions expressed in a certain music sample.
翻译:多数情感识别方法都采用单向分类或回归法,估计音乐样本分布的一般情感,但不考虑情感差异(例如,幸福可以进一步分为许多、中度或微小的幸福)。我们建议一种跨模式的音乐情感识别法,将音乐样本与情绪结合到共同空间,同时考虑其一般和具体特点。由于音乐样本与情绪结合因主观人类感知而不确定,我们计算了混合的基于损失的嵌入法,以尽量扩大两种统计特征,其中一种是音乐样本与基于卡通主义相关性分析的情感之间的关联,另一种是音乐样本与KL-diverence的情感之间的概率相似性。关于两个基准数据集的实验表明我们的方法优于单向基线。此外,详细分析表明,我们的方法可以实现强大的跨模式音乐情感认识,不仅确定音乐样本与特定情感相匹配,而且还检测某种音乐样本中表达的情感。