Human gender bias is reflected in language and text production. Because state-of-the-art machine translation (MT) systems are trained on large corpora of text, mostly generated by humans, gender bias can also be found in MT. For instance when occupations are translated from a language like English, which mostly uses gender neutral words, to a language like German, which mostly uses a feminine and a masculine version for an occupation, a decision must be made by the MT System. Recent research showed that MT systems are biased towards stereotypical translation of occupations. In 2019 the first, and so far only, challenge set, explicitly designed to measure the extent of gender bias in MT systems has been published. In this set measurement of gender bias is solely based on the translation of occupations. In this paper we present an extension of this challenge set, called WiBeMT, with gender-biased adjectives and adds sentences with gender-biased verbs. The resulting challenge set consists of over 70, 000 sentences and has been translated with three commercial MT systems: DeepL Translator, Microsoft Translator, and Google Translate. Results show a gender bias for all three MT systems. This gender bias is to a great extent significantly influenced by adjectives and to a lesser extent by verbs.
翻译:人类的性别偏见反映在语言和文字制作中,由于最先进的机器翻译(MT)系统在大量文本组合(主要由人产生)方面受过培训,所以在MT中也可以找到性别偏见。例如,职业从英语(大多使用性别中立的文字)等语言翻译成德语(主要使用性别中立的文字)等语言时,必须由MT系统作出一项决定,而德语主要使用女性和男性的版本作为职业,因此必须由MT系统作出一项决定。最近的研究显示,MT系统偏向于职业的定型翻译。在2019年,首次和迄今只公布了一套挑战,明确设计以衡量MT系统中性别偏见的程度。在这一套性别偏见的衡量方法中,完全以职业的翻译为基础。在本文中,我们介绍了这项挑战的延伸,即称为WiBEMT, 带有性别偏见,加上带有性别偏见的动词。由此产生的挑战由70 000多句话组成,并经过三个商业MT系统翻译:深L翻译、微软翻译和谷歌翻译系统 Transtatelate。结果显示,性别偏见的程度大大低于所有三个系统。