Traditional Mixture-of-Experts (MoE) networks benefit from utilizing multiple smaller expert models as opposed to a single large network. However, these experts typically operate independently, leaving a question open about whether interconnecting these models could enhance the performance of MoE networks. In response, we introduce GRAPHMOE, a novel method aimed at augmenting the cognitive depth of language models via a self-rethinking mechanism constructed on Pseudo GraphMoE networks. GRAPHMOE employs a recurrent routing strategy to simulate iterative thinking steps, thereby facilitating the flow of information among expert nodes. We implement the GRAPHMOE architecture using Low-Rank Adaptation techniques (LoRA) and conduct extensive experiments on various benchmark datasets. The experimental results reveal that GRAPHMOE outperforms other LoRA based models, achieving state-of-the-art (SOTA) performance. Additionally, this study explores a novel recurrent routing strategy that may inspire further advancements in enhancing the reasoning capabilities of language models.
翻译:传统的混合专家(MoE)网络得益于使用多个较小的专家模型,而非单一的庞大网络。然而,这些专家通常独立运作,这引发了一个开放性问题:将这些模型相互连接是否能提升MoE网络的性能。为此,我们提出了GRAPHMOE,一种旨在通过构建在伪图混合专家网络上的自反思机制来增强语言模型认知深度的新方法。GRAPHMOE采用循环路由策略来模拟迭代思考步骤,从而促进专家节点间的信息流动。我们利用低秩自适应技术(LoRA)实现了GRAPHMOE架构,并在多个基准数据集上进行了广泛实验。实验结果表明,GRAPHMOE优于其他基于LoRA的模型,达到了最先进的性能。此外,本研究探索了一种新颖的循环路由策略,这可能为增强语言模型的推理能力带来进一步的启发。