Many studies have revealed that word embeddings, language models, and models for specific downstream tasks in NLP are prone to social biases, especially gender bias. Recently these techniques have been gradually applied to automatic evaluation metrics for text generation. In the paper, we propose an evaluation method based on Word Embeddings Association Test (WEAT) and Sentence Embeddings Association Test (SEAT) to quantify social biases in evaluation metrics and discover that social biases are also widely present in some model-based automatic evaluation metrics. Moreover, we construct gender-swapped meta-evaluation datasets to explore the potential impact of gender bias in image caption and text summarization tasks. Results show that given gender-neutral references in the evaluation, model-based evaluation metrics may show a preference for the male hypothesis, and the performance of them, i.e. the correlation between evaluation metrics and human judgments, usually has more significant variation after gender swapping.
翻译:许多研究显示,在《国家劳工规划》中,单词嵌入、语言模型和具体下游任务模式容易受到社会偏见,特别是性别偏见;最近,这些技术逐渐应用于文本生成的自动评价指标;在文件中,我们提议了一种基于《单嵌入协会测试》和《判决嵌入协会测试》的评价方法,以量化评价指标中的社会偏见,并发现一些基于模型的自动评价指标中也广泛存在社会偏见;此外,我们构建了性别稀释的元评价数据集,以探讨性别偏见在图像说明和文本汇总任务中的潜在影响;结果显示,在评价中以性别中立为参照,基于模型的评价指标可能显示偏爱男性假设,以及这些假设的性能,即评价指标与人类判断的关联性,通常在性别转换后有更显著的变化。