This paper looks at solving collaborative planning problems formalized as Decentralized POMDPs (Dec-POMDPs) by searching for Nash equilibria, i.e., situations where each agent's policy is a best response to the other agents' (fixed) policies. While the Joint Equilibrium-based Search for Policies (JESP) algorithm does this in the finite-horizon setting relying on policy trees, we propose here to adapt it to infinite-horizon Dec-POMDPs by using finite state controller (FSC) policy representations. In this article, we (1) explain how to turn a Dec-POMDP with $N-1$ fixed FSCs into an infinite-horizon POMDP whose solution is an $N^\text{th}$ agent best response; (2) propose a JESP variant, called \infJESP, using this to solve infinite-horizon Dec-POMDPs; (3) introduce heuristic initializations for JESP aiming at leading to good solutions; and (4) conduct experiments on state-of-the-art benchmark problems to evaluate our approach.
翻译:本文探讨通过寻找纳什平衡(即每个代理人的政策是对其他代理人(固定)政策的最佳反应),解决作为分散式POMDP(Dec-POMDPs)正式化的合作规划问题。尽管基于联合平衡的搜索政策(JESP)算法在依赖政策树的有限视距设置中这样做,我们在此建议利用有限的州控制员(FSC)的政策说明,将其调整为无限偏差的Dec-POMDPs。在本条中,我们(1)解释如何将一个以1美元固定FSCs为单位的Dec-POMDP转换成一个以1美元固定FSCs为单位的无限偏差式POMDP,其解决办法是最佳反应;(2) 提出一个称为\infJESP的JESP变方,利用它来解决无限偏差Dec-POMDPs;(3)为JESP引入旨在找到良好解决办法的超度初始初始化初始化概念;以及(4)对状态基准问题进行实验,以评价我们的方法。