Targeted evaluations have found that machine translation systems often output incorrect gender, even when the gender is clear from context. Furthermore, these incorrectly gendered translations have the potential to reflect or amplify social biases. We propose a gender-filtered self-training technique to improve gender translation accuracy on unambiguously gendered inputs. This approach uses a source monolingual corpus and an initial model to generate gender-specific pseudo-parallel corpora which are then added to the training data. We filter the gender-specific corpora on the source and target sides to ensure that sentence pairs contain and correctly translate the specified gender. We evaluate our approach on translation from English into five languages, finding that our models improve gender translation accuracy without any cost to generic translation quality. In addition, we show the viability of our approach on several settings, including re-training from scratch, fine-tuning, controlling the balance of the training data, forward translation, and back-translation.
翻译:有针对性的评价发现,机器翻译系统往往产生不正确的性别,即使性别从上下文看是清楚的。此外,这些不正确的性别翻译有可能反映或扩大社会偏见。我们提出一种经过性别过滤的自我培训技术,以提高明确性别投入的性别翻译准确性。这一方法使用一种单一语言的原始资料和初步模型,产生针对性别的假平行体,然后将其添加到培训数据中。我们过滤源头和目标方的针对性别的组合,以确保对口语包含和正确翻译特定性别。我们评估了我们关于将英语翻译成五种语言的方法,发现我们的模型提高了性别翻译的准确性,而通用翻译质量却没有任何成本。此外,我们还展示了我们在若干环境中的做法的可行性,包括从零到零、微调整、控制培训数据平衡、前翻译和后译。