The sparse Mixture-of-Experts (MoE) architecture of large language models (LLMs) confronts an inherent issue of load imbalance arising from the simplistic linear router strategy, which ultimately causes the instability and inefficient learning of LLMs. To address this challenge, we introduce a novel MoE graph-based framework $\textbf{GMoE}$, aimed at enhancing the collaboration among multiple experts. In GMoE, a graph router function is designed to capture the collaboration signals among experts. This enables all experts to dynamically allocate information derived from input data by sharing information with their neighboring experts. Moreover, we put forward two coordination strategies in GMoE: the $\textit{Poisson distribution-based distinction strategy}$ and the $\textit{Normal distribution-based balance strategy}$, to further release the capacity of each expert and increase the model stability in the fine-tuning of LLMs. Specifically, we leverage a parameter-efficient fine-tuning technique, i.e., Low-Rank Adaptation (LoRA), to implement the graph MoE architecture. Extensive experiments on four real-world benchmark datasets demonstrate the effectiveness of GMoE, showing the benefits of facilitating collaborations of multiple experts in LLM fine-tuning. The code of experimental implementation is available at https://github.com/BAI-LAB/GMoE
翻译:大语言模型(LLMs)的稀疏混合专家(MoE)架构面临由简单线性路由策略引起的固有负载不均衡问题,这最终导致LLMs学习过程不稳定且低效。为解决这一挑战,我们提出了一种新颖的基于MoE图的框架$\textbf{GMoE}$,旨在增强多个专家之间的协同能力。在GMoE中,我们设计了图路由函数以捕捉专家间的协同信号,使所有专家能够通过与相邻专家共享信息,动态分配来自输入数据的信息。此外,我们提出了GMoE中的两种协调策略:$\textit{基于泊松分布的区分策略}$和$\textit{基于正态分布的平衡策略}$,以进一步释放每个专家的能力并提升LLMs微调过程中的模型稳定性。具体而言,我们采用参数高效微调技术——低秩自适应(LoRA)来实现图MoE架构。在四个真实世界基准数据集上的大量实验证明了GMoE的有效性,展现了促进多专家协作在LLM微调中的优势。实验实现代码公开于https://github.com/BAI-LAB/GMoE