Federated learning has been predominantly concerned with collaborative training of deep networks from scratch, and especially the many challenges that arise, such as communication cost, robustness to heterogeneous data, and support for diverse device capabilities. However, there is no unified framework that addresses all these problems together. This paper studies the challenges and opportunities of exploiting pre-trained Transformer models in FL. In particular, we propose to efficiently adapt such pre-trained models by injecting a novel attention-based adapter module at each transformer block that both modulates the forward pass and makes an early prediction. Training only the lightweight adapter by FL leads to fast and communication-efficient learning even in the presence of heterogeneous data and devices. Extensive experiments on standard FL benchmarks, including CIFAR-100, FEMNIST and SpeechCommandsv2 demonstrate that this simple framework provides fast and accurate FL while supporting heterogenous device capabilities, efficient personalization, and scalable-cost anytime inference.
翻译:联邦学习主要涉及从零开始对深网络进行协作培训,特别是由此产生的许多挑战,如通信成本、对不同数据的稳健性和对不同设备能力的支持,然而,没有统一框架共同解决所有这些问题,本文研究了在FL中利用预先培训的变压器模型的挑战和机遇。我们特别建议通过在每个变压器块中注入一个新的关注型适应器模块来有效调整这些预培训模式,既调整前通过又作出早期预测。只有FL的轻量级适应器才能导致快速和高效的通信学习,即使在有多种数据和装置的情况下也是如此。关于标准FL基准(包括CIFAR-100、FEMNIST和SpeopleCommondsv2)的广泛实验表明,这一简单框架提供了快速和准确的FL,同时支持异质装置能力、高效个性化和随时可计量的成本。