Agents in decentralized multi-agent navigation lack the world knowledge to make safe and (near-)optimal plans reliably. They base their decisions on their neighbors' observable states, which hide the neighbors' navigation intent. We propose augmenting decentralized navigation with inter-agent communication to improve their performance and aid agent in making sound navigation decisions. In this regard, we present a novel reinforcement learning method for multi-agent collision avoidance using selective inter-agent communication. Our network learns to decide 'when' and with 'whom' to communicate to request additional information in an end-to-end fashion. We pose communication selection as a link prediction problem, where the network predicts if communication is necessary given the observable information. The communicated information augments the observed neighbor information to select a suitable navigation plan. As the number of neighbors for a robot varies, we use a multi-head self-attention mechanism to encode neighbor information and create a fixed-length observation vector. We validate that our proposed approach achieves safe and efficient navigation among multiple robots in challenging simulation benchmarks. Aided by learned communication, our network performs significantly better than existing decentralized methods across various metrics such as time-to-goal and collision frequency. Besides, we showcase that the network effectively learns to communicate when necessary in a situation of high complexity.
翻译:分散式多试剂导航中的代理人缺乏可靠地制定安全和(近距离)最佳计划的世界知识。 他们的决定以邻居的观察状态为基础, 隐藏邻居的导航意图。 我们提议增加分散式导航, 增加代理人之间的通讯, 以提高其性能, 协助代理人做出健全的导航决定。 在这方面, 我们提出一种新的强化学习方法, 使用选择性的代理人间通讯来避免多试剂碰撞。 我们的网络学会了“ 何时” 和“ 与谁” 进行沟通, 以最终到最后的方式要求额外信息。 我们把通信选择作为连接预测问题, 网络预测通信是否有必要提供可观测的信息。 传送的信息增加了观察到的邻居信息, 以选择合适的导航计划。 由于机器人的邻居数量不同, 我们使用多头自省自省机制来编码邻居的信息, 并创建一个固定长度的观测矢量。 我们确认我们提出的方法在挑战模拟基准中实现了安全高效的导航。 我们通过学习到的通讯, 我们的网络比现有的分散式方法要好得多, 超越了各种计量标准, 以有效的时间到高频路。