Recent work has shown results on learning navigation policies for idealized cylinder agents in simulation and transferring them to real wheeled robots. Deploying such navigation policies on legged robots can be challenging due to their complex dynamics, and the large dynamical difference between cylinder agents and legged systems. In this work, we learn hierarchical navigation policies that account for the low-level dynamics of legged robots, such as maximum speed, slipping, contacts, and learn to successfully navigate cluttered indoor environments. To enable transfer of policies learned in simulation to new legged robots and hardware, we learn dynamics-aware navigation policies across multiple robots with robot-specific embeddings. The learned embedding is optimized on new robots, while the rest of the policy is kept fixed, allowing for quick adaptation. We train our policies across three legged robots in simulation - 2 quadrupeds (A1, AlienGo) and a hexapod (Daisy). At test time, we study the performance of our learned policy on two new legged robots in simulation (Laikago, 4-legged Daisy), and one real-world quadrupedal robot (A1). Our experiments show that our learned policy can sample-efficiently generalize to previously unseen robots, and enable sim-to-real transfer of navigation policies for legged robots.
翻译:最近的工作显示,在模拟中为理想化气瓶剂进行模拟并将其转让给真正的轮式机器人方面,对理想化气瓶剂的导航政策学习了学习。 将这种导航政策运用在腿式机器人上可能具有挑战性,因为其动态复杂,气瓶剂和腿式系统之间存在巨大的动态差异。 在这项工作中,我们学习了等级导航政策,这种政策考虑到脚型机器人的低级动态,例如最大速度、滑落、接触和学习成功导航杂乱的室内环境。为了能够将模拟中所学到的政策转换到新的脚型机器人和硬件,我们学习了多机器人与机器人的动态导航政策。在新机器人上优化了学习的嵌入,而其余的政策则保持了优化,允许快速适应。我们在模拟中对三个腿式机器人的低级导航政策(A1, Aliengo) 和六肢式机器人(Daisyy) 进行了培训。在测试时,我们研究了我们在模拟中学习的两种新脚式机器人政策(Laikagogo, 4led Disisi) 以及一个真实世界级的四层机器人升级政策演示了我们以往的机器人政策。