评估印地语-英语机器翻译中的性别偏见 (Evaluating Gender Bias in Hindi-English Machine Translation)

With language models being deployed increasingly in the real world, it is essential to address the issue of the fairness of their outputs. The word embedding representations of these language models often implicitly draw unwanted associations that form a social bias within the model. The nature of gendered languages like Hindi, poses an additional problem to the quantification and mitigation of bias, owing to the change in the form of the words in the sentence, based on the gender of the subject. Additionally, there is sparse work done in the realm of measuring and debiasing systems for Indic languages. In our work, we attempt to evaluate and quantify the gender bias within a Hindi-English machine translation system. We implement a modified version of the existing TGBI metric based on the grammatical considerations for Hindi. We also compare and contrast the resulting bias measurements across multiple metrics for pre-trained embeddings and the ones learned by our machine translation model.

翻译：随着语言模式在现实世界中日益被运用,必须解决其产出的公正性问题。这些语言模式的字嵌入表达方式往往隐含地吸引出在模式中形成社会偏见的不想要的协会。印地语等性别语言的性质对量化和减少偏见构成另一个问题,因为句子中的文字形式发生了变化,以主题的性别为基础。此外,在衡量和贬低印地语系统方面所做的工作很少。我们在工作中试图评估和量化印地语和英语机器翻译系统中的性别偏见。我们根据印地语的语法考虑,实施了现有TGBI衡量标准的修订版本。我们还比较和对比了为预先培训的嵌入和我们机器翻译模型所学的多重衡量标准。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【伯克利】黑盒机器翻译系统的模仿攻击与防御，Imitation Attacks and Defenses for Black-box Machine Translation Systems

专知会员服务

7+阅读 · 2020年5月4日

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日