We employ Natural Language Processing techniques to analyse 377808 English song lyrics from the "Two Million Song Database" corpus, focusing on the expression of sexism across five decades (1960-2010) and the measurement of gender biases. Using a sexism classifier, we identify sexist lyrics at a larger scale than previous studies using small samples of manually annotated popular songs. Furthermore, we reveal gender biases by measuring associations in word embeddings learned on song lyrics. We find sexist content to increase across time, especially from male artists and for popular songs appearing in Billboard charts. Songs are also shown to contain different language biases depending on the gender of the performer, with male solo artist songs containing more and stronger biases. This is the first large scale analysis of this type, giving insights into language usage in such an influential part of popular culture.
翻译:我们运用自然语言处理技术来分析《两百万个数据库》中377808个英文歌词,重点是五十多年(1960-2010年)的性别主义表现和性别偏见的衡量。我们使用性别主义分类法,用手动附加注释的流行歌曲的少量样本来比以往的研究更大规模地识别性别歧视歌词。此外,我们通过测量在歌曲歌词中学习的文字嵌入的协会来揭示性别偏见。我们发现性别歧视的内容会不断增加,特别是男性艺术家和在布告板图上出现的流行歌曲。还显示,歌曲中含有不同语言的偏见,视表演者的性别而定,男性独唱家歌曲含有更多、更强烈的偏见。这是首次大规模分析这类类型的,揭示了大众文化中如此有影响力的部分语言使用情况。