Autonomous vehicles are suited for continuous area patrolling problems. However, finding an optimal patrolling strategy can be challenging for many reasons. Firstly, patrolling environments are often complex and can include unknown and evolving environmental factors. Secondly, autonomous vehicles can have failures or hardware constraints such as limited battery lives. Importantly, patrolling large areas often requires multiple agents that need to collectively coordinate their actions. In this work, we consider these limitations and propose an approach based on a distributed, model-free deep reinforcement learning based multi-agent patrolling strategy. In this approach, agents make decisions locally based on their own environmental observations and on shared information. In addition, agents are trained to automatically recharge themselves when required to support continuous collective patrolling. A homogeneous multi-agent architecture is proposed, where all patrolling agents have an identical policy. This architecture provides a robust patrolling system that can tolerate agent failures and allow supplementary agents to be added to replace failed agents or to increase the overall patrol performance. This performance is validated through experiments from multiple perspectives, including the overall patrol performance, the efficiency of the battery recharging strategy, the overall robustness of the system, and the agents' ability to adapt to environment dynamics.
翻译:但是,由于多种原因,寻找最佳巡逻战略可能具有挑战性。第一,巡逻环境往往复杂,可能包括未知和不断变化的环境因素。第二,自治车辆可能有故障或硬件限制,例如电池寿命有限。重要的是,在大片地区巡逻往往需要多个代理人来集体协调行动。在这项工作中,我们考虑这些局限性,并根据基于分散的、没有模型的深入强化学习多试剂巡逻战略提出一种方法。在这种方法中,代理人根据自己的环境观察和共享的信息,在当地作出决定。此外,在需要支持持续集体巡逻时,还训练代理人自动充电。提议了一个同质多试剂结构,所有巡逻人员都有相同的政策。这一结构提供了一个强有力的巡逻系统,可以容忍代理人的故障,允许补充代理人替换失败的代理人或提高总体巡逻业绩。这一业绩通过从多方面的实验得到验证,包括总体巡逻业绩、电池重建战略的效率、系统的总体坚固性以及代理人适应环境动态的能力。