Large language models are rapidly gaining popularity and have been widely adopted in real-world applications. While the quality of training data is essential, privacy concerns arise during data collection. Federated learning offers a solution by allowing multiple clients to collaboratively train LLMs without sharing local data. However, FL introduces new challenges, such as model convergence issues due to heterogeneous data and high communication costs. A comprehensive study is required to address these challenges and guide future research. This paper surveys Federated learning for LLMs (FedLLM), highlighting recent advances and future directions. We focus on two key aspects: fine-tuning and prompt learning in a federated setting, discussing existing work and associated research challenges. We finally propose potential directions for federated LLMs, including pre-training, federated agents, and LLMs for federated learning.
翻译:大语言模型正迅速普及并广泛应用于实际场景中。尽管训练数据的质量至关重要,但数据收集过程中的隐私问题日益凸显。联邦学习通过允许多个客户端在不共享本地数据的情况下协同训练大语言模型,为此提供了一种解决方案。然而,联邦学习也带来了新的挑战,例如因数据异构性导致的模型收敛问题以及高昂的通信成本。需要开展全面研究以应对这些挑战并指导未来工作。本文系统综述了面向大语言模型的联邦学习,重点阐述了最新进展与未来方向。我们聚焦于联邦环境下的两个关键方面:微调与提示学习,探讨了现有研究成果及相关研究挑战。最后,我们提出了联邦大语言模型的潜在发展方向,包括预训练、联邦智能体以及面向联邦学习的大语言模型应用。