This paper develops the method of Continuous Pontryagin Differentiable Programming (Continuous PDP), which enables a robot to learn an objective function from a few sparsely demonstrated keyframes. The keyframes, labeled with some time stamps, are the desired task-space outputs, which a robot is expected to follow sequentially. The time stamps of the keyframes can be different from the time of the robot's actual execution. The method jointly finds an objective function and a time-warping function such that the robot's resulting trajectory sequentially follows the keyframes with minimal discrepancy loss. The Continuous PDP minimizes the discrepancy loss using projected gradient descent, by efficiently solving the gradient of the robot trajectory with respect to the unknown parameters. The method is first evaluated on a simulated robot arm and then applied to a 6-DoF quadrotor to learn an objective function for motion planning in unmodeled environments. The results show the efficiency of the method, its ability to handle time misalignment between keyframes and robot execution, and the generalization of objective learning into unseen motion conditions.
翻译:本文开发了连续 Pontryagin 差异编程( 连续的 PDP) 方法, 使机器人能够从几个少许演示的键盘中学习客观功能。 标记为一定时间戳的键盘是想要的任务空间输出, 机器人可以按顺序跟踪。 键盘的时间戳可能与机器人实际执行时的不同。 方法共同发现一个客观功能和时间扭曲功能, 使机器人的导轨迹沿键框顺序顺序, 并尽可能减少差异损失。 连续的 PDP 通过有效解决机器人轨迹相对于未知参数的梯度, 将差异损失最小化 。 该方法首先在模拟机器人臂上进行评估, 然后应用到一个 6- DoF 的二次曲线模型, 学习在非模型环境中进行运动规划的客观函数 。 其结果显示了方法的效率, 及其处理键框和机器人执行之间时间错配的能力, 以及客观学习到无形运动条件的通用 。