In this paper, we investigate what types of stereotypical information are captured by pretrained language models. We present the first dataset comprising stereotypical attributes of a range of social groups and propose a method to elicit stereotypes encoded by pretrained language models in an unsupervised fashion. Moreover, we link the emergent stereotypes to their manifestation as basic emotions as a means to study their emotional effects in a more generalized manner. To demonstrate how our methods can be used to analyze emotion and stereotype shifts due to linguistic experience, we use fine-tuning on news sources as a case study. Our experiments expose how attitudes towards different social groups vary across models and how quickly emotions and stereotypes can shift at the fine-tuning stage.
翻译:在本文中,我们调查哪些类型的陈规定型信息被预先培训的语言模式所捕捉。我们展示了第一批由一系列社会群体的陈规定型特征构成的数据集,并提出了一种方法,以在不受监督的情况下以经过培训的语言模式来引领那些由经过培训的语言模式编码的陈规定型观念。此外,我们把新出现的陈规定型与它们作为基本情感的一种表现联系起来,作为研究其情感影响的一种方式。为了展示我们如何利用我们的方法来分析由于语言经验而产生的情感和陈规定型变化,我们用对新闻来源的微调作为案例研究。我们的实验揭示了对不同社会群体的态度如何因模式的不同而不同,在微调阶段,情感和陈规定型变化的速度如何。