Large language models (LLMs) have achieved state-of-the-art performance on a series of natural language understanding tasks. However, these LLMs might rely on dataset bias and artifacts as shortcuts for prediction. This has significantly hurt their Out-of-Distribution (OOD) generalization and adversarial robustness. In this paper, we provide a review of recent developments that address the robustness challenge of LLMs. We first introduce the concepts and robustness challenge of LLMs. We then introduce methods to identify shortcut learning behavior in LLMs, characterize the reasons for shortcut learning, as well as introduce mitigation solutions. Finally, we identify key challenges and introduce the connections of this line of research to other directions.
翻译:大型语言模型(LLMs)在一系列自然语言理解任务中取得了最先进的表现,然而,这些LLMs可能依赖数据集偏向和文物作为预测的捷径,这极大地损害了其分布外(OOOD)的概括性和对抗性强力。我们在本文件中回顾了解决LLMs稳健性挑战的最新动态。我们首先介绍了LLMs的概念和稳健性挑战。我们随后引入了在LLMs中识别快捷学习行为的方法,说明了快捷学习的原因,并引入了缓解解决方案。最后,我们确定了关键的挑战,并引入了这一研究路线与其他方向的联系。