Prompting inputs with natural language task descriptions has emerged as a popular mechanism to elicit reasonably accurate outputs from large-scale generative language models with little to no in-context supervision. This also helps gain insight into how well language models capture the semantics of a wide range of downstream tasks purely from self-supervised pre-training on massive corpora of unlabeled text. Such models have naturally also been exposed to a lot of undesirable content like racist and sexist language and there is limited work on awareness of models along these dimensions. In this paper, we define and comprehensively evaluate how well such language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing. We define three broad classes of task descriptions for these tasks: statement, question, and completion, with numerous lexical variants within each class. We study the efficacy of prompting for each task using these classes and the null task description across several decoding methods and few-shot examples. Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation. We believe our work is an important step towards unbiased language models by quantifying the limits of current self-supervision objectives at accomplishing such sociologically challenging tasks.
 翻译:对自然语言任务描述的启发性投入已成为一种流行机制,可以从大规模基因化语言模型中获取相当准确的、很少或根本没有内置监督的精准产出,这也有助于深入了解语言模型如何很好地掌握一系列广泛的下游任务的语义,这些下游任务完全来自对无标签文本的大规模整体体进行自我监督的预先培训。这些模型自然也暴露于许多不受欢迎的内容中,如种族主义和性别歧视语言,在这些层面对模型的认识方面所做的工作有限。在本文件中,我们界定并全面评估这些语言模型如何很好地掌握了四种偏差任务的语义:诊断、识别、提取和改写。我们为这些任务界定了三大任务描述类别:声明、问题和完成,每个类别中有许多词汇变式。我们研究利用这些分类迅速完成每项任务的效率,以及在若干分解方法和少量例子中无关联性任务说明。我们的分析表明,语言模型能够在不同偏见层面,例如性别和政治联系方面,在不同程度上执行这些任务的程度差异很大。我们认为,我们的工作是朝着实现不偏向性的语言模式的高度目标迈出的重要一步。