FedTune:与预培训变异器一起深入投入高效联邦精密成形 (FedTune: A Deep Dive into Efficient Federated Fine-Tuning with Pre-trained Transformers)

Federated Learning (FL) is an emerging paradigm that enables distributed users to collaboratively and iteratively train machine learning models without sharing their private data. Motivated by the effectiveness and robustness of self-attention-based architectures, researchers are turning to using pre-trained Transformers (i.e., foundation models) instead of traditional convolutional neural networks in FL to leverage their excellent transfer learning capabilities. Despite recent progress, how pre-trained Transformer models play a role in FL remains obscure, that is, how to efficiently fine-tune these pre-trained models in FL and how FL users could benefit from this new paradigm. In this paper, we explore this issue and demonstrate that the fine-tuned Transformers achieve extraordinary performance on FL, and that the lightweight fine-tuning method facilitates a fast convergence rate and low communication costs. Concretely, we conduct a rigorous empirical study of three tuning methods (i.e., modifying the input, adding extra modules, and adjusting the backbone) using two types of pre-trained models (i.e., vision-language models and vision models) for FL. Our experiments show that 1) Fine-tuning the bias term of the backbone performs best when relying on a strong pre-trained model; 2) The vision-language model (e.g., CLIP) outperforms the pure vision model (e.g., ViT) and is more robust to the few-shot settings; 3) Compared to pure local training, FL with pre-trained models has a higher accuracy because it alleviates the problem of over-fitting. We will release our code and encourage further exploration of pre-trained Transformers and FL.

翻译：联邦学习(FL)是一个新兴范例,它使分布式用户能够在不分享私人数据的情况下合作和迭代地培训机器学习模型,使分布式用户能够合作和迭代地培训机器学习模型,而不必分享其私人数据。受基于自我注意的架构的有效性和稳健性驱动,研究人员转而使用事先训练的变异器(即基础模型),而不是FL的传统连锁神经网络来利用其出色的转移学习能力。尽管最近取得了进展,但预先训练的变异器模型如何在FL中发挥作用仍然模糊不清,即如何高效率地微调FL中这些经过预先训练的模型,以及FL用户如何从这一新模式中受益。我们探讨这一问题,并表明经过微调的变异器在FL中取得了非凡的业绩,轻调的微调方法有助于快速趋同速度和低的通信成本。具体地说,我们对三种调方法(即修改投入、添加额外的模型,调整骨干)使用两种经过训练前的简化的模型(即视觉模型和视觉模型),因为经过较严格的FL模型,我们进行了精细的精度的精度分析。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日