Transformer-based pre-trained models have revolutionized NLP for superior performance and generality. Fine-tuning pre-trained models for downstream tasks often require private data, for which federated learning is the de-facto approach (i.e., FedNLP). However, our measurements show that FedNLP is prohibitively slow due to the large model sizes and the resultant high network/computation cost. Towards practical FedNLP, we identify as the key building blocks adapters, small bottleneck modules inserted at a variety of model layers. A key challenge is to properly configure the depth and width of adapters, to which the training speed and efficiency is highly sensitive. No silver-bullet configuration exists: the optimal choice varies across downstream NLP tasks, desired model accuracy, and client resources. A silver-bullet configuration does not exist and a non-optimal configuration could significantly slow down the training. To automate adapter configuration, we propose AutoFedNLP, a framework that enhances the existing FedNLP with two novel designs. First, AutoFedNLP progressively upgrades the adapter configuration throughout a training session. Second, AutoFedNLP continuously profiles future adapter configurations by allocating participant devices to trial groups. To minimize client-side computations, AutoFedNLP exploits the fact that a FedNLP client trains on the same samples repeatedly between consecutive changes of adapter configurations, and caches computed activations on clients. Extensive experiments show that AutoFedNLP can reduce FedNLP's model convergence delay to no more than several hours, which is up to 155.5$\times$ faster compared to vanilla FedNLP and 48$\times$ faster compared to strong baselines.
翻译:以变换器为基础的预培训模型已经将NLP革命化为优异性能和通用性能。为下游任务而调整预培训模型通常需要私人数据,对于这些数据,联邦化学习是非facto 方法(即 FedNLP )。然而,我们的测量显示,FedNLP由于模型规模大,且由此导致的网络/计算成本高,其速度太慢,因此FedNLP太慢。对于实用的 FedNLLP,我们将FP 确定为关键构件调控器,在各种模型层中反复插入小型瓶颈模块。一个关键的挑战是如何正确配置调试器的深度和宽度,培训速度和效率都非常敏感。没有银色的组合:最优选择在下游 NLP 任务、理想模型精度精度精度和客户源资源之间有所不同。银色的配置和非优化配置可以大大减缓培训。对于自动调制配置,我们建议AutFP 调整NP,这个框架可以用两种新设计来增强现有的美化 NLP 。首先,对NP 不断比较NP 的客户端调化的升级的升级的升级的升级的客户端对F 将自动升级升级升级到将来的客户到FDalLLL 的升级到整个的升级到F 版本,将自动升级到F 升级到F 升级到F 的升级到F 将使得所有F 升级到F 升级到不断升级到F 升级到F 升级到F 升级到F 升级到F 升级到所有F 升级到F 升级到F 升级到F 的服务器到F 的升级的客户的升级到所有的升级到F 。