Pre-trained language models have been known to perpetuate biases from the underlying datasets to downstream tasks. However, these findings are predominantly based on monolingual language models for English, whereas there are few investigative studies of biases encoded in language models for languages beyond English. In this paper, we fill this gap by analysing gender bias in West Slavic language models. We introduce the first template-based dataset in Czech, Polish, and Slovak for measuring gender bias towards male, female and non-binary subjects. We complete the sentences using both mono- and multilingual language models and assess their suitability for the masked language modelling objective. Next, we measure gender bias encoded in West Slavic language models by quantifying the toxicity and genderness of the generated words. We find that these language models produce hurtful completions that depend on the subject's gender. Perhaps surprisingly, Czech, Slovak, and Polish language models produce more hurtful completions with men as subjects, which, upon inspection, we find is due to completions being related to violence, death, and sickness.
翻译:预训练语言模型已知将基础数据集中的偏见传递给下游任务。然而,这些发现主要基于英语的单语言模型,而对于英语以外的语言,很少进行偏见的研究。本文揭示了西斯拉夫语言模型中的性别偏见。我们介绍了第一份基于模板的捷克语、波兰语和斯洛伐克语数据集,用于测量针对男性、女性和非二元主体的性别偏见。我们使用单语言和多语言语言模型完成语句,并评估其用于掩码语言建模目标的适用性。接下来,通过量化生成单词的有毒性和性别特征,测量西斯拉夫语言模型中的性别偏见。我们发现这些语言模型会生成有害的语句,这取决于主体的性别。也许令人惊讶的是,捷克语、斯洛伐克语和波兰语语言模型与男性主体相关的完整性更高,我们发现这是由于完成的答案与暴力、死亡和疾病相关。