Gender is widely discussed in the context of language tasks and when examining the stereotypes propagated by language models. However, current discussions primarily treat gender as binary, which can perpetuate harms such as the cyclical erasure of non-binary gender identities. These harms are driven by model and dataset biases, which are consequences of the non-recognition and lack of understanding of non-binary genders in society. In this paper, we explain the complexity of gender and language around it, and survey non-binary persons to understand harms associated with the treatment of gender as binary in English language technologies. We also detail how current language representations (e.g., GloVe, BERT) capture and perpetuate these harms and related challenges that need to be acknowledged and addressed for representations to equitably encode gender information.
翻译:在语言任务的背景下,在审查语言模式传播的陈规定型观念时,广泛讨论了性别问题,但是,目前的讨论主要将性别作为二元问题,这可能会使伤害永久化,例如周期性地消除非二元性别认同,这些伤害是由模式和数据集偏差驱动的,这些偏差是社会上不承认和不了解非二元性别的后果,我们在本文件中解释了性别和语言的复杂性,并调查非二元人如何理解与在英语技术中将性别作为二元处理有关的伤害,我们还详细介绍了目前的语言表述(例如GloVe、BERT)如何捕捉和延续这些伤害和相关挑战,这些伤害和相关挑战需要得到承认和解决,才能使性别信息公平化。