A collaborative task is assigned to a multiagent system (MAS) in which agents are allowed to communicate. The MAS runs over an underlying Markov decision process and its task is to maximize the averaged sum of discounted one-stage rewards. Although knowing the global state of the environment is necessary for the optimal action selection of the MAS, agents are limited to individual observations. Inter-agent communication can tackle the issue of local observability, however, the limited rate of inter-agent communication prevents the agents from acquiring the precise global state information. To overcome this challenge, agents need to communicate an abstract version of their observations to each other such that the MAS compromises the minimum possible sum of rewards. We show that this problem is equivalent to a form of rate-distortion problem, which we call task-based information compression (TBIC). We introduce state aggregation for information compression (SAIC) to solve the TBIC problem. SAIC is shown to achieve near-optimal performance in terms of the achieved sum of discounted rewards. The proposed algorithm is applied to a rendezvous problem and its performance is compared with several benchmarks. Numerical experiments confirm the superiority of the proposed algorithm.
翻译:合作任务被指派给多试剂系统,允许代理商进行交流。MAS运行于一个基本的Markov决策程序,任务是最大限度地提高折扣单阶段奖励的平均和折扣额。虽然知道环境的全球状况对于最佳行动选择MAS是必要的,但代理商仅限于个别观察。机构间通信可以解决当地可观察性问题,但是,代理商通信的比例有限,使代理商无法获得准确的全球国家信息。为了克服这一挑战,MAS需要将其观察结果的抽象版本传递给对方,以便MAS会损及最低可能的报酬总和。我们表明,这个问题相当于一种标准扭曲问题,我们称之为基于任务的信息压缩(TBIC)。我们引入信息压缩国家汇总(SAIC)以解决TBIC问题。SAIC显示,在所实现的折扣报酬总和方面,其业绩接近最佳。拟议的算法适用于会合问题,其性能与若干基准比较。Numerical实验证实了拟议的算法的优越性。