Mixture-of-Experts (MoE) models enable scalable neural networks through conditional computation. However, their deployment with federated learning (FL) faces two critical challenges: 1) resource-constrained edge devices cannot store full expert sets, and 2) non-IID data distributions cause severe expert load imbalance that degrades model performance. To this end, we propose \textbf{FLEX-MoE}, a novel federated MoE framework that jointly optimizes expert assignment and load balancing under limited client capacity. Specifically, our approach introduces client-expert fitness scores that quantify the expert suitability for local datasets through training feedback, and employs an optimization-based algorithm to maximize client-expert specialization while enforcing balanced expert utilization system-wide. Unlike existing greedy methods that focus solely on personalization while ignoring load imbalance, our FLEX-MoE is capable of addressing the expert utilization skew, which is particularly severe in FL settings with heterogeneous data. Our comprehensive experiments on three different datasets demonstrate the superior performance of the proposed FLEX-MoE, together with its ability to maintain balanced expert utilization across diverse resource-constrained scenarios.
翻译:专家混合(Mixture-of-Experts, MoE)模型通过条件计算实现了可扩展的神经网络。然而,其在联邦学习(Federated Learning, FL)中的部署面临两个关键挑战:1)资源受限的边缘设备无法存储完整的专家集合;2)非独立同分布的数据分布会导致严重的专家负载不均衡,从而降低模型性能。为此,我们提出 \textbf{FLEX-MoE},一种新颖的联邦 MoE 框架,在有限客户端容量的约束下联合优化专家分配与负载均衡。具体而言,我们的方法引入了客户端-专家适应度分数,通过训练反馈量化专家对本地数据集的适用性,并采用基于优化的算法,在强制系统范围内均衡专家使用的同时,最大化客户端-专家的专业化程度。与现有仅关注个性化而忽略负载不均衡的贪心方法不同,我们的 FLEX-MoE 能够有效应对专家使用偏斜问题,该问题在数据异构的联邦学习场景中尤为严重。我们在三个不同数据集上的全面实验证明了所提出的 FLEX-MoE 的优越性能,以及其在多样化资源受限场景下保持专家使用均衡的能力。