Large pre-trained language models contain societal biases and carry along these biases to downstream tasks. Current in-processing bias mitigation approaches (like adversarial training) impose debiasing by updating a model's parameters, effectively transferring the model to a new, irreversible debiased state. In this work, we propose a novel approach to develop stand-alone debiasing functionalities separate from the model, which can be integrated into the model on-demand, while keeping the core model untouched. Drawing from the concept of AdapterFusion in multi-task learning, we introduce DAM (Debiasing with Adapter Modules) - a debiasing approach to first encapsulate arbitrary bias mitigation functionalities into separate adapters, and then add them to the model on-demand in order to deliver fairness qualities. We conduct a large set of experiments on three classification tasks with gender, race, and age as protected attributes. Our results show that DAM improves or maintains the effectiveness of bias mitigation, avoids catastrophic forgetting in a multi-attribute scenario, and maintains on-par task performance, while granting parameter-efficiency and easy switching between the original and debiased models.
翻译:经过培训的大型语言模型含有社会偏见,并沿袭着这些下游任务。目前的处理中的减少偏见方法(如对抗性培训)通过更新模型参数而造成偏差,有效地将模型转化为新的不可逆转的去偏差状态。在这项工作中,我们提出一种新的方法来开发独立于模型的单独消除偏见功能,这些功能可以纳入按需模式,同时保持核心模型的不受影响。我们从多任务学习中的适应者优异概念出发,我们引入DAM(与适应者模块的偏差)-一种降低偏差的方法,将最初的任意减少偏见功能纳入不同的适应者,然后将其添加到按需设计的模型中,以便提供公平性。我们针对性别、种族和年龄作为受保护属性的三种分类任务进行了大量实验。我们的结果显示,DAM改进或保持了减少偏见的有效性,避免了多任务情景中的灾难性遗漏,并保持了任务绩效,同时给予原型和受贬损模式之间的参数效率和容易转换。