The Common Information (CI) approach provides a systematic way to transform a multi-agent stochastic control problem to a single-agent partially observed Markov decision problem (POMDP) called the coordinator's POMDP. However, such a POMDP can be hard to solve due to its extraordinarily large action space. We propose a new algorithm for multi-agent stochastic control problems, called coordinator's heuristic search value iteration (CHSVI), that combines the CI approach and point-based POMDP algorithms for large action spaces. We demonstrate the algorithm through optimally solving several benchmark problems.
翻译:Translated abstract:
公共信息(CI)方法提供了一种将多智能体随机控制问题转化为单一智能体部分观测的马尔可夫决策问题(POMDP)的系统性方法,名为协调者的POMDP。然而,由于其异常庞大的动作空间,这样的POMDP可能很难解决。我们提出了一种新的多智能体随机控制问题算法,称为协调者启发式搜索值迭代(CHSVI),将CI方法和用于大型动作空间的基于点的POMDP算法结合起来。我们通过最优求解几个基准问题来演示该算法的效果。