We employ Natural Language Processing techniques to analyse 377808 English song lyrics from the "Two Million Song Database" corpus, focusing on the expression of sexism across five decades (1960-2010) and the measurement of gender biases. Using a sexism classifier, we identify sexist lyrics at a larger scale than previous studies using small samples of manually annotated popular songs. Furthermore, we reveal gender biases by measuring associations in word embeddings learned on song lyrics. We find sexist content to increase across time, especially from male artists and for popular songs appearing in Billboard charts. Songs are also shown to contain different language biases depending on the gender of the performer, with male solo artist songs containing more and stronger biases. This is the first large scale analysis of this type, giving insights into language usage in such an influential part of popular culture.
翻译:我们运用自然语言处理技术,分析了来自“两百万首歌曲数据库”语料库中的377808首英文歌词,重点研究了五十年(1960-2010)间性别歧视表现和性别偏见的测量。使用一个性别歧视分类器,我们可以更大规模地识别歌词中的性别歧视,而不是像以前的研究一样只用小样本手动注释热门歌曲。此外,我们通过使用在歌词中学习到的词嵌入来衡量关联性,揭示了性别偏见。我们发现,性别歧视内容随着时间的推移而增加,尤其是来自男性艺术家和进入公告牌榜单的流行歌曲。此外,我们还发现,歌曲中包含的语言偏见会根据演唱者的性别而有所不同,男性独唱歌曲中包含更多且更强的偏见。这是第一次进行这种大规模分析,为研究这一具有影响力的流行文化领域中的语言使用提供了深入的洞察。