The scientific community is increasingly aware of the necessity to embrace pluralism and consistently represent major and minor social groups. Currently, there are no standard evaluation techniques for different types of biases. Accordingly, there is an urgent need to provide evaluation sets and protocols to measure existing biases in our automatic systems. Evaluating the biases should be an essential step towards mitigating them in the systems. This paper introduces WinoST, a new freely available challenge set for evaluating gender bias in speech translation. WinoST is the speech version of WinoMT which is a MT challenge set and both follow an evaluation protocol to measure gender accuracy. Using a state-of-the-art end-to-end speech translation system, we report the gender bias evaluation on four language pairs and we show that gender accuracy in speech translation is more than 23% lower than in MT.
翻译:科学界日益认识到必须接受多元化,并一贯代表主要和次要社会群体。目前,没有针对不同类型偏见的标准评价技术。因此,迫切需要提供评价组和规程,以衡量我们自动系统中现有的偏见。评价偏见应该是减少这些偏见的必要步骤。本文介绍了WinoST,这是评价语言翻译中性别偏见的一个新的免费挑战。WinoST是WinoMT的演讲版,这是一个MT挑战集,两者都遵循评估协议,以衡量性别准确性。我们使用最先进的端到端语言翻译系统,报告四对语言的性别偏见评价,我们显示语言翻译中的性别准确性比MT低23%以上。