Recent advances in deep learning techniques have enabled machines to generate cohesive open-ended text when prompted with a sequence of words as context. While these models now empower many downstream applications from conversation bots to automatic storytelling, they have been shown to generate texts that exhibit social biases. To systematically study and benchmark social biases in open-ended language generation, we introduce the Bias in Open-Ended Language Generation Dataset (BOLD), a large-scale dataset that consists of 23,679 English text generation prompts for bias benchmarking across five domains: profession, gender, race, religion, and political ideology. We also propose new automated metrics for toxicity, psycholinguistic norms, and text gender polarity to measure social biases in open-ended text generation from multiple angles. An examination of text generated from three popular language models reveals that the majority of these models exhibit a larger social bias than human-written Wikipedia text across all domains. With these results we highlight the need to benchmark biases in open-ended language generation and caution users of language generation models on downstream tasks to be cognizant of these embedded prejudices.
翻译:最近深层次学习技术的进步使机器能够产生具有凝聚力的开放文本,同时以一系列文字作为背景。这些模型现在赋予了许多下游应用,从谈话机器人到自动讲故事,赋予了许多下游应用能力,从对话机器人到自动讲故事,但是这些模型显示它们产生了具有社会偏见的文本。为了系统地研究和衡量不开放语言一代中的社会偏见,我们在开放语言生成数据集(BOLD)中引入了“Bias”,这是一个大型数据集,由23 679个英文文本生成组成,促使在以下五个领域建立偏见基准:专业、性别、种族、宗教和政治意识形态。我们还提出了关于毒性、精神语言规范以及文字性别极性的新自动计量标准,以衡量从多重角度在不开放文本生成过程中的社会偏见。对三种流行语言模型产生的文本进行的审查表明,这些模型中的大多数在各个领域都表现出比人写维基百科文本更大的社会偏见。通过这些结果,我们强调需要将开放语言生成中的偏见作为基准,并提醒下游任务的语言生成模式的使用者认识到这些根深蒂固的偏见。