Modern pretrained language models are critical components of NLP pipelines. Yet, they suffer from spurious correlations, poor out-of-domain generalization, and biases. Inspired by recent progress in causal machine learning, in particular the invariant risk minimization (IRM) paradigm, we propose invariant language modeling, a framework for learning invariant representations that generalize better across multiple environments. In particular, we adapt a game-theoretic implementation of IRM (IRM-games) to language models, where the invariance emerges from a specific training schedule in which all the environments compete to optimize their own environment-specific loss by updating subsets of the model in a round-robin fashion. In a series of controlled experiments, we demonstrate the ability of our method to (i) remove structured noise, (ii) ignore specific spurious correlations without affecting global performance, and (iii) achieve better out-of-domain generalization. These benefits come with a negligible computational overhead compared to standard training, do not require changing the local loss, and can be applied to any language model architecture. We believe this framework is promising to help mitigate spurious correlations and biases in language models.
翻译:现代经过培训的语言模型是NLP管道的关键组成部分。然而,这些模型却存在虚假的关联性、落后的外部概括性和偏见。受因果机学习最近的进展,特别是无差别风险最小化(IRM)范式的启发,我们建议采用变化式语言模型,学习不同表现形式的框架,这种框架在多个环境中更加普遍化。特别是,我们将游戏理论性实施IMM(IRM游戏)与语言模型相适应,这种模式的偏差来自特定的培训时间表,其中所有环境都通过以圆环方式更新模型的子集来竞争优化其自身的环境损失。在一系列受控制的实验中,我们展示了我们的方法能够(一)消除结构性噪音,(二)在不影响全球业绩的情况下忽视具体的虚假关联,(三)实现更好的外部概括化。这些好处与标准培训相比是微不足道的计算性间接费用,不需要改变本地损失,并且可以应用到任何语言模型结构中。我们认为,这一框架能够帮助降低套用语言的偏差性。