Large pretrained language models are critical components of modern NLP pipelines. Yet, they suffer from spurious correlations, poor out-of-domain generalization, and biases. Inspired by recent progress in causal machine learning, in particular the invariant risk minimization (IRM) paradigm, we propose invariant language modeling, a framework for learning invariant representations that generalize better across multiple environments. In particular, we adapt a game-theoretic formulation of IRM (IRM-games) to language models, where the invariance emerges from a specific training schedule in which all the environments compete to optimize their own environment-specific loss by updating subsets of the model in a round-robin fashion. We focus on controlled experiments to precisely demonstrate the ability of our method to (i) remove structured noise, (ii) ignore specific spurious correlations without affecting global performance, and (iii) achieve better out-of-domain generalization. These benefits come with a negligible computational overhead compared to standard training, do not require changing the local loss, and can be applied to any language model. We believe this framework is promising to help mitigate spurious correlations and biases in language models.
翻译:受过训练的大型语言模型是现代NLP管道的关键组成部分。然而,这些模型却存在虚假的相关性、落后的外部概括和偏见。受因果机学习最近的进展,特别是无差别风险最小化(IRM)范式的启发,我们建议采用变化式语言模型,这是学习不同表现形式的框架,这种框架在多个环境中更加普遍化。特别是,我们把综合资源管理(IRM游戏)的游戏理论性配方调整为语言模型,这种模式的偏差来自一种具体的培训时间表,其中所有环境都通过以圆柱式方式更新模型的子集来竞争优化其自身的环境损失。我们注重于有控制的实验,以准确地展示我们的方法在以下方面的能力:(一) 消除结构化的噪音,(二) 在不影响全球业绩的情况下忽视具体的虚假关联,(三) 实现更好的外部概括化。这些好处是,与标准培训相比,微不足道的计算间接费用,不需要改变当地损失,并且可以应用到任何语言模型。我们认为,这一框架有望有助于减少虚假的关联性和语言模型。