Little attention is placed on analyzing nationality bias in language models, especially when nationality is highly used as a factor in increasing the performance of social NLP models. This paper examines how a text generation model, GPT-2, accentuates pre-existing societal biases about country-based demonyms. We generate stories using GPT-2 for various nationalities and use sensitivity analysis to explore how the number of internet users and the country's economic status impacts the sentiment of the stories. To reduce the propagation of biases through large language models (LLM), we explore the debiasing method of adversarial triggering. Our results show that GPT-2 demonstrates significant bias against countries with lower internet users, and adversarial triggering effectively reduces the same.
翻译:很少注意分析语言模式中的国籍偏见,特别是当国籍被大量用作提高社会国家语言方案模式绩效的一个因素时,本文件审视了文本生成模式GPT-2如何强调先前存在的基于国家的恶魔的社会偏见。我们为不同国籍的人群编造故事,利用GPT-2进行敏感性分析,探讨互联网用户的数量和国家的经济状况如何影响故事的情绪。为了减少大语言模式中偏见的传播,我们探索了对抗性触发的不利方法。我们的结果表明,GPT-2展示了对互联网用户较少的国家的重大偏见,而对抗性触发也有效地减少了同样的情况。