We present Mirable's submission to the 2021 Emotions and Themes in Music challenge. In this work, we intend to address the question: can we leverage semi-supervised learning techniques on music emotion recognition? With that, we experiment with noisy student training, which has improved model performance in the image classification domain. As the noisy student method requires a strong teacher model, we further delve into the factors including (i) input training length and (ii) complementary music representations to further boost the performance of the teacher model. For (i), we find that models trained with short input length perform better in PR-AUC, whereas those trained with long input length perform better in ROC-AUC. For (ii), we find that using harmonic pitch class profiles (HPCP) consistently improve tagging performance, which suggests that harmonic representation is useful for music emotion tagging. Finally, we find that noisy student method only improves tagging results for the case of long training length. Additionally, we find that ensembling representations trained with different training lengths can improve tagging results significantly, which suggest a possible direction to explore incorporating multiple temporal resolutions in the network architecture for future work.
翻译:我们向音乐中的2021年情感和主题挑战展示了Mirable的呈文。 在这项工作中,我们打算解决以下问题:我们能否利用音乐情感识别方面的半监督学习技术? 以此,我们实验了吵闹的学生培训,这提高了图像分类领域的模型性能。由于吵闹的学生方法需要强大的教师模式,我们进一步深入探讨各种因素,包括(一) 投入培训长度和(二) 补充音乐演示,以进一步提高教师模式的性能。关于(一),我们发现经过短期投入培训的模型在PR-AUC中表现得更好,而那些经过长期投入培训的模型在ROC-AUC中表现得更好。关于(二),我们发现使用协调式声道类描述(HPCP)不断提高标记性能,这表明调和式表述对音乐情感标记有用。最后,我们发现,吵闹闹的学生方法只能改善长培训时间的标记结果。此外,我们发现,加入经过训练的模拟演示可以显著地改进标记结果,这表示可能的方向是探索将多个时间分辨率纳入网络结构,供未来工作。