This paper develops the Continuous Pontryagin Differentiable Programming (Continuous PDP) method that enables a robot to learn a control utility function from a few number of sparsely demonstrated keyframes. The keyframes are few desired sequential outputs that a robot is wanted to follow at certain time instances. The duration of the keyframes may be different from that of the robot actual execution. The method jointly searches for a robot control utility function and a time-warping function such that the robot motion sequentially follows the given keyframes with minimal discrepancy loss. Continuous PDP minimizes the discrepancy loss using projected gradient descent, by efficiently solving the gradient of robot motion with respect to the unknown parameters. The method is first evaluated on a simulated two-link robot arm, and then applied to a 6-DoF maneuvering quadrotor to learn a utility function from keyframes for its motion planning in un-modeled environments with obstacles. The results show the efficiency of the method, its ability to handle time misalignment between keyframes and robot execution, and the generalization of the learned utility function into unseen motion conditions.
翻译:本文开发了“ 连续 Pontryagin 差异编程” (PDP) 方法, 使机器人能够从少数少量显示的键盘中学习控制工具功能。 键盘是机器人在特定情况下想要遵循的少数想要的顺序输出。 键盘的长度可能不同于机器人实际执行的时间框架。 用于联合搜索机器人控制工具功能和时间调整功能的方法, 使机器人运动依次遵循给定关键框架, 并尽可能减少差异损失。 持续 PDP 将差异损失最小化, 使用预测的梯度下降, 有效解决机器人运动相对于未知参数的梯度。 该方法首先在模拟的双链接机器人臂上进行评估, 然后应用到一个 6 - DoF 调控重矩, 以学习一个功能性功能, 以在有障碍的未建模环境中进行运动规划。 其结果显示该方法的效率, 及其处理关键框架和机器人执行之间的时间错位的能力, 以及将所学过的实用功能普遍化为隐形运动条件 。