In recent years Landmark Complexes have been successfully employed for localization-free and metric-free autonomous exploration using a group of sensing-limited and communication-limited robots in a GPS-denied environment. To ensure rapid and complete exploration, existing works make assumptions on the density and distribution of landmarks in the environment. These assumptions may be overly restrictive, especially in hazardous environments where landmarks may be destroyed or completely missing. In this paper, we first propose a deep reinforcement learning framework for multi-agent cooperative exploration in environments with sparse landmarks while reducing client-server communication. By leveraging recent development on partial observability and credit assignment, our framework can train the exploration policy efficiently for multi-robot systems. The policy receives individual rewards from actions based on a proximity sensor with limited range and resolution, which is combined with group rewards to encourage collaborative exploration and construction of the Landmark Complex through observation of 0-, 1- and 2-dimensional simplices. In addition, we employ a three-stage curriculum learning strategy to mitigate the reward sparsity by gradually adding random obstacles and destroying random landmarks. Experiments in simulation demonstrate that our method outperforms the state-of-the-art landmark complex exploration method in efficiency among different environments with sparse landmarks.
翻译:近年来,Landmart Complexs成功地用于在GPS封闭的环境中利用一组有限和通信限制的遥感机器人进行无地方化和无标准自主勘探,利用一组有限和通信限制的机器人在GPS封闭的环境中进行地方化和无标准自主勘探。为了确保迅速和彻底的勘探,现有工程对地标在环境中的密度和分布进行假设。这些假设可能过于严格,特别是在地标可能被摧毁或完全缺失的危险环境中。在本文件中,我们首先提出一个深度强化学习框架,用于多试剂合作探索,在地标稀少的环境下进行无地标的合作探索,同时减少客户-服务员的通信。通过利用最近开发的部分可观察性和信用分配,我们的框架可以对多机器人系统的探索政策进行有效的培训。该政策从基于距离和分辨率有限的近距离传感器的行动中得到个别的回报,与集体奖励相结合,以鼓励通过观察0、1和2维的隐形物来合作探索和建造地标综合建筑。此外,我们采用三阶段课程学习战略,通过逐步增加随机障碍和销毁随机标志性标定标,来减轻奖励。模拟实验表明我们的方法在复杂的地标地标性环境中超越了不同的地标效率。