Limited compute and communication capabilities of edge users create a significant bottleneck for federated learning (FL) of large models. We consider a realistic, but much less explored, cross-device FL setting in which no client has the capacity to train a full large model nor is willing to share any intermediate activations with the server. To this end, we present Principal Sub-Model (PriSM) training methodology, which leverages models low-rank structure and kernel orthogonality to train sub-models in the orthogonal kernel space. More specifically, by applying singular value decomposition (SVD) to original kernels in the server model, PriSM first obtains a set of principal orthogonal kernels in which each one is weighed by its singular value. Thereafter, PriSM utilizes our novel sampling strategy that selects different subsets of the principal kernels independently to create sub-models for clients. Importantly, a kernel with a large singular value is assigned with a high sampling probability. Thus, each sub-model is a low-rank approximation of the full large model, and all clients together achieve the near full-model training. Our extensive evaluations on multiple datasets in various resource-constrained settings show that PriSM can yield an improved performance of up to 10% compared to existing alternatives, with only around 20% sub-model training.
翻译:边缘用户的有限计算和通信能力为大型模型的联结学习( FL) 创建了一个巨大的瓶颈。 我们认为这是一个现实的、但更不用探讨的交叉设计FL设置,在这个设置中,没有一个客户有能力训练一个完整的大模型,也不愿意与服务器分享任何中间启动功能。 为此,我们提出首席子模型(PriSM)培训方法,利用低级别结构和内核交错式模型,在正方圆内核空间中培训子模型。更具体地说,通过对服务器模型的原始内核应用单值分解(SVD),PriSM首先获得一套主要或方圆内核模型,其中每个客户都没有能力,也没有能力将每个客户按其单值衡量。 之后,PriSM利用我们的新取样战略,独立地选择主要内核的不同子组,为客户创建子模型。 关键是,一个具有大单值的内核值的内核内核,概率很高。因此,每个子模型都比整个模型的低级近近端近端近端点点,我们整个模型的模型的模型的高级模型显示所有的资源客户,只有全部的模型的高级模型,才能获得整个模型。