Unfair stereotypical biases (e.g., gender, racial, or religious biases) encoded in modern pretrained language models (PLMs) have negative ethical implications for widespread adoption of state-of-the-art language technology. To remedy for this, a wide range of debiasing techniques have recently been introduced to remove such stereotypical biases from PLMs. Existing debiasing methods, however, directly modify all of the PLMs parameters, which -- besides being computationally expensive -- comes with the inherent risk of (catastrophic) forgetting of useful language knowledge acquired in pretraining. In this work, we propose a more sustainable modular debiasing approach based on dedicated debiasing adapters, dubbed ADELE. Concretely, we (1) inject adapter modules into the original PLM layers and (2) update only the adapters (i.e., we keep the original PLM parameters frozen) via language modeling training on a counterfactually augmented corpus. We showcase ADELE, in gender debiasing of BERT: our extensive evaluation, encompassing three intrinsic and two extrinsic bias measures, renders ADELE, very effective in bias mitigation. We further show that -- due to its modular nature -- ADELE, coupled with task adapters, retains fairness even after large-scale downstream training. Finally, by means of multilingual BERT, we successfully transfer ADELE, to six target languages.
翻译:现代预先培训语言模式(PLM)所编码的不公平的定型偏见(如性别、种族或宗教偏见)对广泛采用最先进的语言技术具有负面的道德影响。为了纠正这种情况,最近引进了广泛的贬低技术,以消除PLM的这种定型偏见。但是,现有的贬低方法直接修改所有PLM参数,这些参数除了计算成本昂贵外,还具有内在的风险,即(灾难性的)忘记了在培训前获得的有用语言知识。在这项工作中,我们建议一种更可持续的模块式贬低方法,其基础是专门减少偏见的适应者,称为ADELE。具体地说,我们(1) 将适应者模块注入最初的PLM层次,(2) 仅更新适应者(即,我们把原有的PLM参数冻结在反实际增强能力的培训中)。我们用性别偏差来展示ADELE:我们的广泛评价,包括三种内在的和两个外部的定型偏差措施,将六种定型的定型的偏差措施,使LEEA倡议在模块化后,最终显示其偏差性。