Detecting social bias in text is challenging due to nuance, subjectivity, and difficulty in obtaining good quality labeled datasets at scale, especially given the evolving nature of social biases and society. To address these challenges, we propose a few-shot instruction-based method for prompting pre-trained language models (LMs). We select a few class-balanced exemplars from a small support repository that are closest to the query to be labeled in the embedding space. We then provide the LM with instruction that consists of this subset of labeled exemplars, the query text to be classified, a definition of bias, and prompt it to make a decision. We demonstrate that large LMs used in a few-shot context can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models. We observe that the largest 530B parameter model is significantly more effective in detecting social bias compared to smaller models (achieving at least 13% improvement in AUC metric compared to other models). It also maintains a high AUC (dropping less than 2%) when the labeled repository is reduced to as few as $100$ samples. Large pretrained language models thus make it easier and quicker to build new bias detectors.
翻译:在文本中发现社会偏见是困难的,原因是细微的、主观的和难以获得质量良好的标签数据集,特别是考虑到社会偏见和社会的演变性质。为了应对这些挑战,我们建议了以几发指示为基础的方法来推动经过训练的语文模式(LMs)。我们从一个与嵌入空间中贴上标签的查询最接近的小型支持存储库中挑选出几类平衡的模范。我们然后向LM提供由这个标签的模版组组成的教学,查询文本有待分类,偏见的定义,并促使它作出决定。我们表明,在几发情况下使用的大型LMs可以探测出不同类型的精细微的偏差,与经过微调的模型相似,有时甚至更精确。我们观察到,最大的530B参数模型比较小的模型在发现社会偏差方面要有效得多(与其他模型相比,AUC衡量标准比其他模型改进至少13%)。它还保持一种高的AUC(低于2%),在标签的存储库中比新的偏差更快时,其语言的精确度将降低为100的样品。