The Visibility-based Persistent Monitoring (VPM) problem seeks to find a set of trajectories (or controllers) for robots to persistently monitor a changing environment. Each robot has a sensor, such as a camera, with a limited field-of-view that is obstructed by obstacles in the environment. The robots may need to coordinate with each other to ensure no point in the environment is left unmonitored for long periods of time. We model the problem such that there is a penalty that accrues every time step if a point is left unmonitored. However, the dynamics of the penalty are unknown to us. We present a Multi-Agent Reinforcement Learning (MARL) algorithm for the VPM problem. Specifically, we present a Multi-Agent Graph Attention Proximal Policy Optimization (MA-G-PPO) algorithm that takes as input the local observations of all agents combined with a low resolution global map to learn a policy for each agent. The graph attention allows agents to share their information with others leading to an effective joint policy. Our main focus is to understand how effective MARL is for the VPM problem. We investigate five research questions with this broader goal. We find that MA-G-PPO is able to learn a better policy than the non-RL baseline in most cases, the effectiveness depends on agents sharing information with each other, and the policy learnt shows emergent behavior for the agents.
翻译:基于可见度的持久性监测(VPM)问题试图找到一套机器人持续监测变化环境的轨迹(或控制器) 。 每个机器人都有一台传感器, 如相机, 视场有限, 环境障碍阻碍。 机器人可能需要彼此协调, 以确保环境中的任何点都没有长期不受监测。 我们模拟了问题, 如果一个点没有受到监测, 每一步都会受到处罚。 然而, 我们不知道处罚的动态。 我们为 VPMM问题提出了一个多点强化学习( MARL)算法。 具体地说, 我们提出多点关注优化政策( MA- G- PPO) 算法, 将所有代理器的当地观察和低分辨率全球地图结合起来, 学习每个代理器的政策。 图形关注使代理商能够与其他人共享信息, 我们的主要重点是了解 MARL 是如何有效地对待多数 VPMPMML 代理商。 我们用五个更清晰的基调的策略, 我们用一个更宽广的策略来学习其它代理商。