We interpret solving the multi-vehicle routing problem as a team Markov game with partially observable costs. For a given set of customers to serve, the playing agents (vehicles) have the common goal to determine the team-optimal agent routes with minimal total cost. Each agent thereby observes only its own cost. Our multi-agent reinforcement learning approach, the so-called multi-agent Neural Rewriter, builds on the single-agent Neural Rewriter to solve the problem by iteratively rewriting solutions. Parallel agent action execution and partial observability require new rewriting rules for the game. We propose the introduction of a so-called pool in the system which serves as a collection point for unvisited nodes. It enables agents to act simultaneously and exchange nodes in a conflict-free manner. We realize limited disclosure of agent-specific costs by only sharing them during learning. During inference, each agents acts decentrally, solely based on its own cost. First empirical results on small problem sizes demonstrate that we reach a performance close to the employed OR-Tools benchmark which operates in the perfect cost information setting.
翻译:我们把解决多车辆路由问题解释为具有部分可观察成本的马可夫团队游戏。对于一组特定客户来说,游戏代理商(车辆)有一个共同的目标,即以最低总成本确定团队最佳代理商路线。每个代理商只观察自己的成本。我们的多试剂强化学习方法,即所谓的多试剂神经再编,以单试剂神经再编为基础,通过迭接重写解决方案解决问题。平行代理商行动执行和部分易用性要求为游戏制定新的重写规则。我们提议在系统中引入一个所谓的集合,作为未受访节点的收集点。它使代理商能够同时行动并以无冲突的方式交换节点。我们只在学习期间分享它们,从而实现有限地披露特定代理商成本。在推断过程中,每个代理商只根据自己的成本行事。关于小问题规模的初步经验结果显示,我们达到接近在成本信息设定中运行的被雇用的OR-Tool基准的绩效。