In this paper, we present a learning-based approach that allows a robot to quickly follow a reference path defined in joint space without exceeding limits on the position, velocity, acceleration and jerk of each robot joint. Contrary to offline methods for time-optimal path parameterization, the reference path can be changed during motion execution. In addition, our approach can utilize sensory feedback, for instance, to follow a reference path with a bipedal robot without losing balance. With our method, the robot is controlled by a neural network that is trained via reinforcement learning using data generated by a physics simulator. From a mathematical perspective, the problem of tracking a reference path in a time-optimized manner is formalized as a Markov decision process. Each state includes a fixed number of waypoints specifying the next part of the reference path. The action space is designed in such a way that all resulting motions comply with the specified kinematic joint limits. The reward function finally reflects the trade-off between the execution time, the deviation from the desired reference path and optional additional objectives like balancing. We evaluate our approach with and without additional objectives and show that time-optimized path tracking can be successfully learned for both industrial and humanoid robots. In addition, we demonstrate that networks trained in simulation can be successfully transferred to a real robot.
翻译:在本文中,我们展示了一种基于学习的方法,使机器人能够快速遵循共同空间定义的参考路径,而不会超过每个机器人联合的方位、速度、加速率和混杂率的限制。 与时间- 最佳路径参数的离线方法相反, 参考路径可以在运动执行过程中改变。 此外, 我们的方法可以使用感应反馈, 例如, 使用双翼机器人来遵循参考路径而不会失去平衡。 我们的方法是, 机器人由神经网络控制, 通过使用物理模拟器生成的数据进行强化学习来培训。 从数学角度看, 以时间- 优化方式跟踪参考路径的问题被正式确定为马尔科夫决定过程。 每个州都包含固定数量的路径点, 指定参考路径的下一部分。 行动空间的设计方式是, 所有导致运动的动作都符合指定的运动联合限制。 奖励功能最终反映了执行时间之间的交易, 偏离理想的参考路径和额外目标, 比如平衡。 我们从数学角度评估我们的方法, 而不使用额外的目标, 并且显示在经过培训的机器人中, 成功演示一个经过时间- 路径跟踪的机器人, 能够成功学习。