The need for rapid and reliable robot deployment is on the rise. Imitation Learning (IL) has become popular for producing motion planning policies from a set of demonstrations. However, many methods in IL are not guaranteed to produce stable policies. The generated policy may not converge to the robot target, reducing reliability, and may collide with its environment, reducing the safety of the system. Stable Estimator of Dynamic Systems (SEDS) produces stable policies by constraining the Lyapunov stability criteria during learning, but the Lyapunov candidate function had to be manually selected. In this work, we propose a novel method for learning a Lyapunov function and a policy using a single neural network model. The method can be equipped with an obstacle avoidance module for convex object pairs to guarantee no collisions. We demonstrated our method is capable of finding policies in several simulation environments and transfer to a real-world scenario.
翻译:快速和可靠机器人部署的需要正在增加。 模拟学习(IL)在从一系列演示中产生运动规划政策时已经变得流行。 但是, IL的许多方法没有保证产生稳定的政策。 所产生的政策可能不会与机器人目标趋同, 降低可靠性, 并可能与其环境相撞, 降低系统的安全性。 动态系统的稳定模拟器( SEDS)通过限制学习期间的Lyapunov稳定性标准, 产生了稳定的政策, 但Lyapunov 候选功能必须手工选择 。 在这项工作中, 我们提出了一个学习 Lyapunov 函数的新方法, 以及使用单一神经网络模型的政策。 这种方法可以配备一个避免障碍的模块, 用于连接对象对配方, 以保证不发生碰撞。 我们展示了我们的方法能够在若干模拟环境中找到政策, 并转换到现实世界的情景 。