Navigation policies are commonly learned on idealized cylinder agents in simulation, without modelling complex dynamics, like contact dynamics, arising from the interaction between the robot and the environment. Such policies perform poorly when deployed on complex and dynamic robots, such as legged robots. In this work, we learn hierarchical navigation policies that account for the low-level dynamics of legged robots, such as maximum speed, slipping, and achieve good performance at navigating cluttered indoor environments. Once such a policy is learned on one legged robot, it does not directly generalize to a different robot due to dynamical differences, which increases the cost of learning such a policy on new robots. To overcome this challenge, we learn dynamics-aware navigation policies across multiple robots with robot-specific embeddings, which enable generalization to new unseen robots. We train our policies across three legged robots - 2 quadrupeds (A1, AlienGo) and a hexapod (Daisy). At test time, we study the performance of our learned policy on two new legged robots (Laikago, 4-legged Daisy) and show that our learned policy can sample-efficiently generalize to previously unseen robots.
翻译:在模拟中,通常对理想化的气瓶剂进行导航政策,而没有模拟来自机器人与环境相互作用的复杂动态,例如接触动态,而是在模拟中对理想化的气瓶剂进行模拟。这种政策在机器人与环境相互作用中产生的复杂和动态机器人(如脚踏脚的机器人)上实施时表现不佳。在这项工作中,我们学习了高层次导航政策,这种政策考虑到脚踏脚的机器人的低水平动态,如最大速度、滑落等。在航行封闭的室内环境上取得了良好的性能。一旦在一只脚踏脚的机器人上学习了这种政策,它就不会直接推广到不同的机器人,因为动态差异增加了学习这种新机器人的政策的成本。为了克服这一挑战,我们学习了多机器人的动态导航政策,这些机器人有机器人特定的嵌入,从而能够对新的看不见的机器人进行概括化。我们在三个脚踏的机器人(A1,Aliengo)和六极机器人(Daisy)之间培训了我们的政策。在试验时,我们研究了我们所学过的关于两个新脚踏式机器人(Laikago,4legeDisaisaisa)的实验性一般机器人的政策表现。