In this work, we study the association between song lyrics and mood through a data-driven analysis. Our data set consists of nearly one million songs, with song-mood associations derived from user playlists on the Spotify streaming platform. We take advantage of state-of-the-art natural language processing models based on transformers to learn the association between the lyrics and moods. We find that a pretrained transformer-based language model in a zero-shot setting -- i.e., out of the box with no further training on our data -- is powerful for capturing song-mood associations. Moreover, we illustrate that training on song-mood associations results in a highly accurate model that predicts these associations for unseen songs. Furthermore, by comparing the prediction of a model using lyrics with one using acoustic features, we observe that the relative importance of lyrics for mood prediction in comparison with acoustics depends on the specific mood. Finally, we verify if the models are capturing the same information about lyrics and acoustics as humans through an annotation task where we obtain human judgments of mood-song relevance based on lyrics and acoustics.
翻译:在这项工作中,我们通过数据驱动的分析,研究歌词和情绪之间的联系。我们的数据集由近100万首歌组成,由来自Spotify流流平台用户播放列表的歌曲混合协会组成。我们利用以变压器为基础的最先进的自然语言处理模型学习歌词和情绪之间的联系。我们发现,在零发式环境中,预先训练的以变压器为基础的语言模型 -- -- 即从盒子里取出,没有关于我们数据的进一步培训 -- -- 能够捕捉歌模协会。此外,我们说明,歌模协会培训的结果是一个非常精确的模式,预测这些隐蔽歌曲的组合。此外,我们通过将一种用歌词预测模型的预测与一种使用声学特征的模型进行比较,我们发现,歌词对情绪预测的相对重要性取决于具体的情绪。最后,我们核查这些模型是否通过一个说明性任务,获取关于歌词和声音与人类的相同信息。我们通过歌词和声学获得关于歌曲相关性的人类判断。