A collaborative task is assigned to a multiagent system (MAS) in which agents are allowed to communicate. The MAS runs over an underlying Markov decision process and its task is to maximize the averaged sum of discounted one-stage rewards. Although knowing the global state of the environment is necessary for the optimal action selection of the MAS, agents are limited to individual observations. The inter-agent communication can tackle the issue of local observability, however, the limited rate of the inter-agent communication prevents the agent from acquiring the precise global state information. To overcome this challenge, agents need to communicate their observations in a compact way such that the MAS compromises the minimum possible sum of rewards. We show that this problem is equivalent to a form of rate-distortion problem which we call the task-based information compression. We introduce a scheme for task-based information compression titled State aggregation for information compression (SAIC), for which a state aggregation algorithm is analytically designed. The SAIC is shown to be capable of achieving near-optimal performance in terms of the achieved sum of discounted rewards. The proposed algorithm is applied to a rendezvous problem and its performance is compared with several benchmarks. Numerical experiments confirm the superiority of the proposed algorithm.
翻译:在多试剂系统(MAS)上指定了一个合作任务,允许代理商进行交流。MAS运行于一个基本的Markov决策程序,任务是最大限度地提高折扣一阶段奖励的平均总和。虽然知道环境的全球状况对于最佳行动选择MAS是必要的,但代理商仅限于个别观察。机构间通信可以解决当地可观察性问题,然而,由于机构间通信的有限速度,代理商无法获取准确的全球状态信息。为了克服这一挑战,MAS需要以紧凑的方式传达其观测结果,使MAS折合得最低可能的报酬总额。我们表明,这一问题相当于一种基于费率的扭曲问题,我们称之为基于任务的信息压缩。我们实行基于任务的信息压缩计划,称为国家信息压缩汇总(SAIC),为此,国家汇总算法是经过分析设计的。SAIC证明,在已实现的折扣总和中,能够实现接近最佳的绩效。拟议的算法适用于一个连接问题,而其性能则与若干基准相比较。Nusmericalinalsurgal。