We employ Natural Language Processing techniques to analyse 377808 English song lyrics from the "Two Million Song Database" corpus, focusing on the expression of sexism across five decades (1960-2010) and the measurement of gender biases. Using a sexism classifier, we identify sexist lyrics at a larger scale than previous studies using small samples of manually annotated popular songs. Furthermore, we reveal gender biases by measuring associations in word embeddings learned on song lyrics. We find sexist content to increase across time, especially from male artists and for popular songs appearing in Billboard charts. Songs are also shown to contain different language biases depending on the gender of the performer, with male solo artist songs containing more and stronger biases. This is the first large scale analysis of this type, giving insights into language usage in such an influential part of popular culture.
翻译:我们采用自然语言处理技术分析来自“两百万歌曲数据库”语料库的377808首英语歌曲歌词,重点研究五十年代(1960-2010)性别歧视的表达和性别偏见的测量。使用性别歧视分类器,我们发现性别歧视的歌词数量比以前使用小样本手动注释的流行歌曲的研究范围更广。此外,我们通过测量以歌词为基础学习的单词嵌入中的关联性,揭示了性别偏见。我们发现,性别歧视内容随着时间的推移而增加,尤其是来自男性艺术家和在公告牌排行榜中出现的流行歌曲。 歌曲中也被展示出不同的语言偏见,这取决于表演者的性别,男性独唱歌曲包含更多和更强的偏见。这是第一次对这种具有影响力的流行文化中的语言使用进行大规模分析,提供了洞见。