Pre-trained language models reflect the inherent social biases of their training corpus. Many methods have been proposed to mitigate this issue, but they often fail to debias or they sacrifice model accuracy. We use conceptors--a soft projection method--to identify and remove the bias subspace in contextual embeddings in BERT and GPT. We propose two methods of applying conceptors (1) bias subspace projection by post-processing; and (2) a new architecture, conceptor-intervened BERT (CI-BERT), which explicitly incorporates the conceptor projection into all layers during training. We find that conceptor post-processing achieves state-of-the-art debiasing results while maintaining or improving BERT's performance on the GLUE benchmark. Although CI-BERT's training takes all layers' bias into account and can outperform its post-processing counterpart in bias mitigation, CI-BERT reduces the language model accuracy. We also show the importance of carefully constructing the bias subspace. The best results are obtained by removing outliers from the list of biased words, intersecting them (using the conceptor AND operation), and computing their embeddings using the sentences from a cleaner corpus.
翻译:培训前语言模式反映了培训中固有的社会偏见。 提出了许多方法来缓解这一问题,但往往没有降低性别偏见或牺牲模型准确性。 我们使用概念器-软投影方法来识别和消除BERT和GPT中环境嵌入中的偏差子空间。 我们提出了两种方法来应用概念器(1) 后处理的偏差子空间投影;以及(2) 一个新的架构、概念或互动的BERT(CI-BERT),明确将概念或投影纳入培训期间的所有层面。 我们发现,概念器后处理在保持或改进BERT在GLUE基准上的性能的同时,取得了最先进的偏差效果。 虽然 CI-BERT的培训考虑到所有层次的偏差,并能在减少偏差方面超越其后处理对等,但CI-BERT降低了语言模型的准确性。 我们还表明了认真构建偏差子空间的重要性。 最佳结果是通过删除偏差词列表中的外端词,将其相互区分(使用概念和操作),以及使用更清洁的句子进行计算。