Mood recognition is an important problem in music informatics and has key applications in music discovery and recommendation. These applications have become even more relevant with the rise of music streaming. Our work investigates the research question of whether we can leverage audio metadata such as artist and year, which is readily available, to improve the performance of mood classification models. To this end, we propose a multi-task learning approach in which a shared model is simultaneously trained for mood and metadata prediction tasks with the goal to learn richer representations. Experimentally, we demonstrate that applying our technique on the existing state-of-the-art convolutional neural networks for mood classification improves their performances consistently. We conduct experiments on multiple datasets and report that our approach can lead to improvements in the average precision metric by up to 8.7 points.
翻译:Mood 的认知是音乐信息学中的一个重要问题,在音乐发现和推荐方面有着重要的应用。随着音乐流的兴起,这些应用变得更加重要。我们的工作调查了研究问题,即我们是否能够利用艺术家和年等随时可得的音频元数据来改善情绪分类模型的性能。为此,我们建议采用多任务学习方法,同时培训一个共享模式,用于情绪和元数据预测任务,目的是学习更丰富的表达方式。我们实验性地证明,在现有的最先进的神经神经神经神经网络上应用我们的技术可以不断改善他们的性能。我们在多个数据集上进行实验,并报告说,我们的方法可以导致平均精确度提高8.7个百分点。