Federated Learning (FL) is an emerging paradigm that enables distributed users to collaboratively and iteratively train machine learning models without sharing their private data. Motivated by the effectiveness and robustness of self-attention-based architectures, researchers are turning to using pre-trained Transformers (i.e., foundation models) instead of traditional convolutional neural networks in FL to leverage their excellent transfer learning capabilities. Despite recent progress, how pre-trained Transformer models play a role in FL remains obscure, that is, how to efficiently fine-tune these pre-trained models in FL and how FL users could benefit from this new paradigm. In this paper, we explore this issue and demonstrate that the fine-tuned Transformers achieve extraordinary performance on FL, and that the lightweight fine-tuning method facilitates a fast convergence rate and low communication costs. Concretely, we conduct a rigorous empirical study of three tuning methods (i.e., modifying the input, adding extra modules, and adjusting the backbone) using two types of pre-trained models (i.e., vision-language models and vision models) for FL. Our experiments show that 1) Fine-tuning the bias term of the backbone performs best when relying on a strong pre-trained model; 2) The vision-language model (e.g., CLIP) outperforms the pure vision model (e.g., ViT) and is more robust to the few-shot settings; 3) Compared to pure local training, FL with pre-trained models has a higher accuracy because it alleviates the problem of over-fitting. We will release our code and encourage further exploration of pre-trained Transformers and FL.
翻译:联邦学习(FL)是一个新兴范例,它使分布式用户能够在不分享私人数据的情况下合作和迭代地培训机器学习模型,使分布式用户能够合作和迭代地培训机器学习模型,而不必分享其私人数据。受基于自我注意的架构的有效性和稳健性驱动,研究人员转而使用事先训练的变异器(即基础模型),而不是FL的传统连锁神经网络来利用其出色的转移学习能力。尽管最近取得了进展,但预先训练的变异器模型如何在FL中发挥作用仍然模糊不清,即如何高效率地微调FL中这些经过预先训练的模型,以及FL用户如何从这一新模式中受益。我们探讨这一问题,并表明经过微调的变异器在FL中取得了非凡的业绩,轻调的微调方法有助于快速趋同速度和低的通信成本。具体地说,我们对三种调方法(即修改投入、添加额外的模型,调整骨干)使用两种经过训练前的简化的模型(即视觉模型和视觉模型),因为经过较严格的FL模型,我们进行了精细的精度的精度分析。