黑人是愤怒的白人是欢乐的? (Blacks is to Anger as Whites is to Joy? Understanding Latent Affective Bias in Large Pre-trained Neural Language Models)

Groundbreaking inventions and highly significant performance improvements in deep learning based Natural Language Processing are witnessed through the development of transformer based large Pre-trained Language Models (PLMs). The wide availability of unlabeled data within human generated data deluge along with self-supervised learning strategy helps to accelerate the success of large PLMs in language generation, language understanding, etc. But at the same time, latent historical bias/unfairness in human minds towards a particular gender, race, etc., encoded unintentionally/intentionally into the corpora harms and questions the utility and efficacy of large PLMs in many real-world applications, particularly for the protected groups. In this paper, we present an extensive investigation towards understanding the existence of "Affective Bias" in large PLMs to unveil any biased association of emotions such as anger, fear, joy, etc., towards a particular gender, race or religion with respect to the downstream task of textual emotion detection. We conduct our exploration of affective bias from the very initial stage of corpus level affective bias analysis by searching for imbalanced distribution of affective words within a domain, in large scale corpora that are used to pre-train and fine-tune PLMs. Later, to quantify affective bias in model predictions, we perform an extensive set of class-based and intensity-based evaluations using various bias evaluation corpora. Our results show the existence of statistically significant affective bias in the PLM based emotion detection systems, indicating biased association of certain emotions towards a particular gender, race, and religion.

翻译：通过开发基于变压器的大型预先培训语言模型(PLM),可以看到在深层次学习的自然语言处理方面出现了突破性的发明,并取得了非常显著的绩效改进。在人造数据流中广泛提供未贴标签的数据,加上自我监督的学习战略,有助于加速大型PLM在语言生成、语言理解等方面取得成功。但是,与此同时,人类思想中潜在的历史偏见/不公平对特定性别、种族等产生潜在的偏见/不公平;将无意/有意的偏见纳入公司伤害,并质疑大型PLM在许多现实应用中的效用和效力,特别是对受保护群体而言。在本文件中,我们进行了广泛的调查,以了解在大型PLMM中存在“消极的Bias”的存在,以揭露情绪上的任何偏差联系,如愤怒、恐惧、欢乐等,与文字情感检测的下游任务有关的性别、种族或宗教。我们探索从基于情感层面的情感偏差分析的初始阶段到情感偏差的偏差性偏差分析,通过寻找一个区域范围内影响性言词的不平衡分布,特别是在受保护的群体中,在大层次上,用基于等级的性别偏差的情感分析结果分析结果,在我们用来进行一定程度的分析,并广泛分析。