This paper presents exploratory work on whether and to what extent biases against queer and trans people are encoded in large language models (LLMs) such as BERT. We also propose a method for reducing these biases in downstream tasks: finetuning the models on data written by and/or about queer people. To measure anti-queer bias, we introduce a new benchmark dataset, WinoQueer, modeled after other bias-detection benchmarks but addressing homophobic and transphobic biases. We found that BERT shows significant homophobic bias, but this bias can be mostly mitigated by finetuning BERT on a natural language corpus written by members of the LGBTQ+ community.
翻译:本文介绍了关于对同性恋和变性者的偏见是否和在多大程度上被编入大型语言模式(LLMs)的探索性工作,如BERT等。我们还提出了在下游任务中减少这些偏见的方法:对同性恋者所写和(或)关于同性恋者的数据模型进行微调。为了衡量反queer的偏见,我们引入了一个新的基准数据集WinoQueer, 以其他偏见检测基准为模型,但针对仇视同性恋和跨性别的偏见。我们发现BERT显示出严重的仇视同性恋的偏见,但通过对LGBT成员编写的自然语言材料进行微调,这种偏见大都能够减轻。