Finding an efficient way to adapt robot trajectory is a priority to improve overall performance of robots. One approach for trajectory planning is through transferring human-like skills to robots by Learning from Demonstrations (LfD). The human demonstration is considered the target motion to mimic. However, human motion is typically optimal for human embodiment but not for robots because of the differences between human biomechanics and robot dynamics. The Dynamic Movement Primitives (DMP) framework is a viable solution for this limitation of LfD, but it requires tuning the second-order dynamics in the formulation. Our contribution is introducing a systematic method to extract the dynamic features from human demonstration to auto-tune the parameters in the DMP framework. In addition to its use with LfD, another utility of the proposed method is that it can readily be used in conjunction with Reinforcement Learning (RL) for robot training. In this way, the extracted features facilitate the transfer of human skills by allowing the robot to explore the possible trajectories more efficiently and increasing robot compliance significantly. We introduced a methodology to extract the dynamic features from multiple trajectories based on the optimization of human-likeness and similarity in the parametric space. Our method was implemented into an actual human-robot setup to extract human dynamic features and used to regenerate the robot trajectories following both LfD and RL with DMP. It resulted in a stable performance of the robot, maintaining a high degree of human-likeness based on accumulated distance error as good as the best heuristic tuning.
翻译:寻找一种有效的方式来适应机器人轨迹是改善机器人整体性能的优先事项。一种轨迹规划的方法是通过由示教学习(LfD)的方式将人类技能传递到机器人中。人类示范被视为要模仿的目标动作。然而,由于人类生物力学和机器人动力学之间的差异,人体运动通常对于人体的最优性不适用于机器人。动态运动原理框架是解决此项LfD限制的可行方法,但需要调整公式中的二阶动态性。我们的贡献是介绍了一种系统的方法,从人类示范中提取动态特征来自动调整DMP框架中的参数。除了与LfD一起使用,所提出的方法的另一个用途是可以轻松与强化学习(RL)相结合用于机器人训练。使用提取的特征可通过使机器人能够更有效地探索可能的轨迹并显著提高机器人的适应性来促进人类技能的传递。我们介绍了一种基于人类相似性和相似性参数空间中优化的多个轨迹的动态特征提取方法。我们的方法已经实施到实际的人机设置中,以提取人类动态特征,并用于遵循LfD和RL与DMP的机器人轨迹再生。结果是机器人的稳定性能,根据累积距离误差维持高度的人体相似性与最佳启发式调整一样好。