Although pre-trained language models, such as BERT, achieve state-of-art performance in many language understanding tasks, they have been demonstrated to inherit strong gender bias from its training data. Existing studies addressing the gender bias issue of pre-trained models, usually recollect and build gender-neutral data on their own and conduct a second phase pre-training on the released pre-trained model with such data. However, given the limited size of the gender-neutral data and its potential distributional mismatch with the original pre-training data, catastrophic forgetting would occur during the second-phase pre-training. Forgetting on the original training data may damage the model's downstream performance to a large margin. In this work, we first empirically show that even if the gender-neutral data for second-phase pre-training comes from the original training data, catastrophic forgetting still occurs if the size of gender-neutral data is smaller than that of original training data. Then, we propose a new method, GEnder Equality Prompt (GEEP), to improve gender fairness of pre-trained models without forgetting. GEEP learns gender-related prompts to reduce gender bias, conditioned on frozen language models. Since all pre-trained parameters are frozen, forgetting on information from the original training data can be alleviated to the most extent. Then GEEP trains new embeddings of profession names as gender equality prompts conditioned on the frozen model. Empirical results show that GEEP not only achieves state-of-the-art performances on gender debiasing in various applications such as pronoun predicting and coreference resolution, but also achieves comparable results on general downstream tasks such as GLUE with original pre-trained models without much forgetting.
翻译:尽管培训前语言模型(如BERT)在很多语言理解任务中取得了最先进的成绩,但事实证明,这些模型从其培训数据中继承了强烈的性别偏见。关于培训前模型的性别偏见问题的现有研究,通常自行重新收集和建立性别中立数据,并用这类数据对所释放的培训前模型进行第二阶段的培训前培训。然而,鉴于性别中立数据的规模有限,而且其与培训前原始数据可能存在分布上的不匹配,因此,在培训前第二阶段将会出现灾难性的忘记。忘记原始培训数据可能会损害该模型的下游业绩,影响很大。在这项工作中,我们首先从经验上表明,即使第二阶段培训前模型的性别中立数据来自最初的培训数据,通常都是从性别中立数据到最初培训前的性别平等数据,但不会忘记,在最初培训前的性别平等模型方面,在降低性别偏见方面,在最底层的性别平等模型上也能够实现性别平等,在最底层的性别平等模型上实现性别平等,因为所有前期的参数都显示,在随后的性别平等方面实现性别平等。