When translating "The secretary asked for details." to a language with grammatical gender, it might be necessary to determine the gender of the subject "secretary". If the sentence does not contain the necessary information, it is not always possible to disambiguate. In such cases, machine translation systems select the most common translation option, which often corresponds to the stereotypical translations, thus potentially exacerbating prejudice and marginalisation of certain groups and people. We argue that the information necessary for an adequate translation can not always be deduced from the sentence being translated or even might depend on external knowledge. Therefore, in this work, we propose to decouple the task of acquiring the necessary information from the task of learning to translate correctly when such information is available. To that end, we present a method for training machine translation systems to use word-level annotations containing information about subject's gender. To prepare training data, we annotate regular source language words with grammatical gender information of the corresponding target language words. Using such data to train machine translation systems reduces their reliance on gender stereotypes when information about the subject's gender is available. Our experiments on five language pairs show that this allows improving accuracy on the WinoMT test set by up to 25.8 percentage points.
翻译:翻译“ 秘书要求详细信息” 时, 可能需要确定“ 秘书” 主题的性别。 如果该句不包含必要的信息, 则并不总是可能进行脱节。 在这种情况下, 机器翻译系统选择最常用的翻译选项, 这往往与陈规定型翻译相对应, 从而可能加剧某些群体和人群的偏见和边缘化。 我们争辩说, 适当翻译所需的信息不能总是从正在翻译的句子中推断出来, 甚至可能取决于外部知识。 因此, 在这项工作中, 我们提议从学习任务中分离获得必要信息的任务, 以便在获得相关信息时进行正确翻译。 为此, 我们提出一种方法, 培训机器翻译系统使用包含主题性别信息的字级说明; 为了编写培训数据, 我们给经常的源语句加上相应目标语言语言的语法性别信息。 使用这些数据来培训机器翻译系统, 当获得有关该主题的性别的信息时, 减少对性别陈规定型观念的依赖。 我们的五个语言配对的实验显示, 通过设置的WinMT测试, 25. 8 显示能够提高WinMT 的精确度。