Language model debiasing has emerged as an important field of study in the NLP community. Numerous debiasing techniques were proposed, but bias ablation remains an unaddressed issue. We demonstrate a novel framework for inspecting bias in pre-trained transformer-based language models via movement pruning. Given a model and a debiasing objective, our framework finds a subset of the model containing less bias than the original model. We implement our framework by pruning the model while fine-tuning it on the debiasing objective. Optimized are only the pruning scores - parameters coupled with the model's weights that act as gates. We experiment with pruning attention heads, an important building block of transformers: we prune square blocks, as well as establish a new way of pruning the entire heads. Lastly, we demonstrate the usage of our framework using gender bias, and based on our findings, we propose an improvement to an existing debiasing method. Additionally, we re-discover a bias-performance trade-off: the better the model performs, the more bias it contains.
翻译:语言模式的贬损是国家语言方案社区的一个重要研究领域。 提出了许多贬损技术, 但偏见的贬损仍然是一个尚未解决的问题。 我们展示了一个通过运动裁剪来检查训练前以变压器为基础的语言模式偏见的新框架。 鉴于一个模型和一个贬损目标, 我们的框架发现了一个比原始模型更不偏差的模式的子集。 我们通过调整模型来实施我们的框架, 同时微调模型的贬损目标。 最优化的只是排减分 - 参数, 加上模型作为大门的重量。 我们实验了一个重要的变压器结构, 即我们普鲁纳平方块, 以及建立一个新的裁剪切整整个大脑的方法。 最后, 我们用性别偏差来展示我们框架的使用情况, 根据我们的调查结果, 我们建议改进现有的贬损方法。 此外, 我们重新发现一种偏差交易: 模型表现得越好, 它包含的偏差。