While the prevalence of large pre-trained language models has led to significant improvements in the performance of NLP systems, recent research has demonstrated that these models inherit societal biases extant in natural language. In this paper, we explore a simple method to probe pre-trained language models for gender bias, which we use to effect a multi-lingual study of gender bias towards politicians. We construct a dataset of 250k politicians from most countries in the world and quantify adjective and verb usage around those politicians' names as a function of their gender. We conduct our study in 7 languages across 6 different language modeling architectures. Our results demonstrate that stance towards politicians in pre-trained language models is highly dependent on the language used. Finally, contrary to previous findings, our study suggests that larger language models do not tend to be significantly more gender-biased than smaller ones.
翻译:虽然经过培训的大型语言模式的普及导致国家语言方案系统业绩的显著改善,但最近的研究表明,这些模式继承了自然语言中存在的社会偏见。在本文中,我们探索了一种简单的方法来调查经过培训的性别偏见语言模式,我们用这种方法对政治家的性别偏见进行多语种研究。我们构建了全世界大多数国家250k名政治家的数据集,并量化了这些政治家姓名的形容和动词用法,以此作为其性别的函数。我们用7种语言在6种不同的语言模式中进行研究。我们的结果表明,在经过培训的语言模式中,对政治家的立场高度依赖所使用的语言。最后,与以往的调查结果相反,我们的研究显示,更大的语言模式往往不会比较小的模式多得多。