Music has the power to evoke intense emotional experiences and regulate the mood of an individual. With the advent of online streaming services, research in music recommendation services has seen tremendous progress. Modern methods leveraging the listening histories of users for session-based song recommendations have overlooked the significance of features extracted from lyrics and acoustic content. We address the task of song prediction through multiple modalities, including tags, lyrics, and acoustic content. In this paper, we propose a novel deep learning approach by refining Attentive Neural Networks using representations derived via a Transformer model for lyrics and Variational Autoencoder for acoustic features. Our model achieves significant improvement in performance over existing state-of-the-art models using lyrical and acoustic features alone. Furthermore, we conduct a study to investigate the impact of users' psychological health on our model's performance.
翻译:音乐能够激发强烈的情感体验,调控个人情绪。随着在线流传服务的出现,音乐建议服务的研究取得了巨大进展。利用用户的听力史进行会话歌曲建议现代方法忽视了歌词和声音内容的特色的重要性。我们通过多种模式,包括标签、歌词和声音内容,处理歌曲预测任务。在本文中,我们提出一种新的深层次学习方法,利用通过歌词变换模型和声学特征变异自动编码模型衍生出来的演示来完善惯性神经网络。我们的模型仅使用语言和声学特征,就大大改进了现有最先进的模型的性能。此外,我们开展了一项研究,以调查用户心理健康对模型性能的影响。