Pre-trained language models learn socially harmful biases from their training corpora, and may repeat these biases when used for generation. We study gender biases associated with the protagonist in model-generated stories. Such biases may be expressed either explicitly ("women can't park") or implicitly (e.g. an unsolicited male character guides her into a parking space). We focus on implicit biases, and use a commonsense reasoning engine to uncover them. Specifically, we infer and analyze the protagonist's motivations, attributes, mental states, and implications on others. Our findings regarding implicit biases are in line with prior work that studied explicit biases, for example showing that female characters' portrayal is centered around appearance, while male figures' focus on intellect.
翻译:受过培训的语言模式从其培训的社团中学会了对社会有害的偏见,并可能会在一代人时重复这些偏见。我们研究了与模型产生的故事中的主角有关的性别偏见。这些偏见可以明示(“妇女不能停车 ” ) 或隐含地表达(例如,未经邀请的男性性能引导她进入停车位 ) 。我们关注隐含的偏见,并使用常识推理引擎来发现这些偏见。具体地说,我们推断和分析主角的动机、属性、精神状态和对他人的影响。我们关于隐含偏见的调查结果与以前研究明显偏见的工作是一致的,例如,表明女性角色的描述围绕外观,而男性人物则侧重于智力。