Large pre-trained language models are widely used in the community. These models are usually trained on unmoderated and unfiltered data from open sources like the Internet. Due to this, biases that we see in platforms online which are a reflection of those in society are in turn captured and learned by these models. These models are deployed in applications that affect millions of people and their inherent biases are harmful to the targeted social groups. In this work, we study the general trend in bias reduction as newer pre-trained models are released. Three recent models ( ELECTRA, DeBERTa, and DistilBERT) are chosen and evaluated against two bias benchmarks, StereoSet and CrowS-Pairs. They are compared to the baseline of BERT using the associated metrics. We explore whether as advancements are made and newer, faster, lighter models are released: are they being developed responsibly such that their inherent social biases have been reduced compared to their older counterparts? The results are compiled and we find that all the models under study do exhibit biases but have generally improved as compared to BERT.
翻译:大型预训练语言模型广泛应用于社区。这些模型通常在来自互联网等开放来源的未经审查和未经过滤的数据上进行训练。因此,我们在在线平台中看到的偏见,反映出社会中的偏见,也被这些模型所捕捉和学习。这些模型部署在影响数百万人的应用程序中,它们固有的偏见对目标社会群体是有害的。在这项工作中,我们研究了最新的预训练模型中的偏见减少趋势。选择了三个最新的模型(ELECTRA、DeBERTa和DistilBERT),并根据两个偏见基准(StereoSet和CrowS-Pairs)进行评估。将它们与BERT的基准进行比较,并使用相关的度量标准。探究当技术进步和发布了更新、更快速、更轻便的模型时,它们是否被负责任地开发,使它们固有的社会偏见相对于旧模型减少?将结果汇编起来,我们发现所有研究中的模型都存在偏见,但通常比BERT有所改进。