We employ Natural Language Processing techniques to analyse 377808 English song lyrics from the "Two Million Song Database" corpus, focusing on the expression of sexism across five decades (1960-2010) and the measurement of gender biases. Using a sexism classifier, we identify sexist lyrics at a larger scale than previous studies using small samples of manually annotated popular songs. Furthermore, we reveal gender biases by measuring associations in word embeddings learned on song lyrics. We find sexist content to increase across time, especially from male artists and for popular songs appearing in Billboard charts. Songs are also shown to contain different language biases depending on the gender of the performer, with male solo artist songs containing more and stronger biases. This is the first large scale analysis of this type, giving insights into language usage in such an influential part of popular culture.
翻译:我们采用自然语言处理技术对《两百万首歌曲数据库》的377808首英文歌词进行分析,重点关注了五个十年以来(1960-2010)性别歧视的表达方式和性别偏见的测量。使用性别歧视分类器,我们在更大的数据规模上识别出了性别歧视的歌词,此前的研究中,只有少量人工标记的流行歌曲。此外,通过使用学习于歌曲歌词的词向量,我们测量了性别偏见。我们发现,性别歧视的内容随着时间的推移而增加,尤其是来自男性艺术家的歌曲和出现在公告牌排行榜上的流行歌曲。歌曲也被证明会根据表演者的性别包含不同的语言偏见,男性独唱歌曲中包含更多和更强的偏见。这是这种具有影响力的流行文化领域中的首次大规模分析,为我们提供了语言使用方面的洞察。