基于状态可分解马尔可夫决策过程的专业化专家混合多任务车辆路径求解器 (Multi-Task Vehicle Routing Solver via Mixture of Specialized Experts under State-Decomposable MDP)

Existing neural methods for multi-task vehicle routing problems (VRPs) typically learn unified solvers to handle multiple constraints simultaneously. However, they often underutilize the compositional structure of VRP variants, each derivable from a common set of basis VRP variants. This critical oversight causes unified solvers to miss out the potential benefits of basis solvers, each specialized for a basis VRP variant. To overcome this limitation, we propose a framework that enables unified solvers to perceive the shared-component nature across VRP variants by proactively reusing basis solvers, while mitigating the exponential growth of trained neural solvers. Specifically, we introduce a State-Decomposable MDP (SDMDP) that reformulates VRPs by expressing the state space as the Cartesian product of basis state spaces associated with basis VRP variants. More crucially, this formulation inherently yields the optimal basis policy for each basis VRP variant. Furthermore, a Latent Space-based SDMDP extension is developed by incorporating both the optimal basis policies and a learnable mixture function to enable the policy reuse in the latent space. Under mild assumptions, this extension provably recovers the optimal unified policy of SDMDP through the mixture function that computes the state embedding as a mapping from the basis state embeddings generated by optimal basis policies. For practical implementation, we introduce the Mixture-of-Specialized-Experts Solver (MoSES), which realizes basis policies through specialized Low-Rank Adaptation (LoRA) experts, and implements the mixture function via an adaptive gating mechanism. Extensive experiments conducted across VRP variants showcase the superiority of MoSES over prior methods.

翻译：现有的多任务车辆路径问题（VRPs）神经方法通常学习统一的求解器来同时处理多种约束。然而，这些方法往往未能充分利用VRP变体的组合结构，而每个变体都可以从一组共同的基准VRP变体中推导出来。这一关键疏忽导致统一求解器错失了基准求解器的潜在优势，而每个基准求解器都专门针对一个基准VRP变体。为了克服这一局限，我们提出了一个框架，使统一求解器能够通过主动复用基准求解器来感知VRP变体间的共享组件特性，同时缓解训练神经求解器数量的指数增长。具体而言，我们引入了一种状态可分解马尔可夫决策过程（SDMDP），它通过将状态空间表示为与基准VRP变体相关联的基准状态空间的笛卡尔积来重新表述VRPs。更重要的是，这种表述自然地产生了每个基准VRP变体的最优基准策略。此外，我们通过结合最优基准策略和一个可学习的混合函数，开发了一种基于潜在空间的SDMDP扩展，以实现策略在潜在空间中的复用。在温和的假设下，该扩展通过一个混合函数可证明地恢复了SDMDP的最优统一策略，该混合函数将状态嵌入计算为最优基准策略生成的基准状态嵌入的映射。为实现实际应用，我们引入了专业化专家混合求解器（MoSES），它通过专业化的低秩自适应（LoRA）专家实现基准策略，并通过自适应门控机制实现混合函数。在多个VRP变体上进行的广泛实验展示了MoSES相对于现有方法的优越性。