In multi-agent informative path planning (MAIPP), agents must collectively construct a global belief map of an underlying distribution of interest (e.g., gas concentration, light intensity, or pollution levels) over a given domain, based on measurements taken along their trajectory. They must frequently replan their path to balance the distributed exploration of new areas and the collective, meticulous exploitation of known high-interest areas, to maximize the information gained within a predefined budget (e.g., path length or working time). A common approach to achieving such cooperation relies on planning the agents' paths reactively, conditioned on other agents' future actions. However, as the agent's belief is updated continuously, these predicted future actions may not end up being the ones executed by agents, introducing a form of noise/inaccuracy in the system and often decreasing performance. In this work, we propose a decentralized deep reinforcement learning (DRL) approach to MAIPP, which relies on an attention-based neural network, where agents optimize long-term individual and cooperative objectives by explicitly sharing their intent (i.e., medium-/long-term future positions distribution, obtained from their individual policy) in a reactive, asynchronous manner. That is, in our work, intent sharing allows agents to learn to claim/avoid broader areas of the world. Moreover, since our approach relies on learned attention over these shared intents, agents are able to learn to recognize the useful portion(s) of these (imperfect) predictions to maximize cooperation even in the presence of imperfect information. Our comparison experiments demonstrate the performance of our approach compared to its variants and high-quality baselines over a large set of MAIPP simulations.
翻译:在多剂信息化路径规划(MAIPP)中,代理商必须根据沿轨测量结果,集体绘制一份全球信仰图,说明某一领域的利益分配基础(如气体浓度、光强度或污染水平),根据沿轨测量结果,他们必须经常重新规划自己的路径,以平衡对新地区的分散探索和对已知高利害地区的集体、仔细利用,在预先确定的预算(如路程长度或工作时间)范围内最大限度地增加获得的信息。 实现这种合作的共同方法取决于对代理商未来行动的被动规划,以其他代理商今后的行动为条件。然而,由于代理商的信念不断得到更新,这些预测的未来行动可能不会结束为代理商所执行的行动,在系统中引入一种噪音/不准确的形式,经常降低绩效。 在这项工作中,我们建议对MAIPPPP采取分散的深入强化学习(DRL)方法,该方法依赖于以关注为基础的神经网络,使代理商通过明确分享其意图(即中长期/长期的比较,未来立场的对比,从其自身的意向分配,从我们个人的意向分配,从其自身的意向分配中,从自我定位的深度分析到我们个人的判断,从自我学习的深度的深度研究,到全球的深度研究的深度研究,从而从全球的深度的深度,从学习到了解其基础,使这些过程的深度的实验的深度的深度的实验,从全球的实验,从学习到自我的深度的深度,从学习到自我的深度的深度,从学习到从全球的深度的深度,从学习,从中,从中,从学习到对世界的实验,从中,从中,从中,从中学习,到不断的实验性的研究。</s>