Large-scale Pre-Trained Language Models (PTLMs) capture knowledge from massive human-written data which contains latent societal biases and toxic contents. In this paper, we leverage the primary task of PTLMs, i.e., language modeling, and propose a new metric to quantify manifested implicit representational harms in PTLMs towards 13 marginalized demographics. Using this metric, we conducted an empirical analysis of 24 widely used PTLMs. Our analysis provides insights into the correlation between the proposed metric in this work and other related metrics for representational harm. We observe that our metric correlates with most of the gender-specific metrics in the literature. Through extensive experiments, we explore the connections between PTLMs architectures and representational harms across two dimensions: depth and width of the networks. We found that prioritizing depth over width, mitigates representational harms in some PTLMs. Our code and data can be found at https://github.com/microsoft/SafeNLP.
翻译:大规模培训前语言模型(PTLM)从包含潜在社会偏见和有毒内容的大规模人类数据中获取知识。在本文件中,我们利用PTLM的主要任务,即语言模型,并提出一个新的指标,以量化PTLM对13个边缘化人口构成的隐含代表性伤害。我们使用这一指标,对24个广泛使用的PTLM进行了经验分析。我们的分析使我们深入了解了这项工作中拟议指标与代表伤害的其他相关指标之间的相互关系。我们发现,我们的指标与文献中大多数针对性别的指标相关。我们通过广泛的实验,探索了PTLMM结构与代表伤害两个方面:网络的深度和宽度之间的联系。我们发现,将深度放在宽度之上,减轻一些PTLMs的代表性伤害。我们的代码和数据可以在https://github.com/microsoft/SafNLP上找到。