This paper presents an approach for learning online generation of collision-free and torque-limited robot trajectories. In order to generate future motions, a neural network is periodically invoked. Based on the current kinematic state of the robot and the network output, a trajectory for the current time interval can be calculated. The main idea of our paper is to execute the computed motion only if a collision-free and torque-limited way to continue the trajectory is known. In practice, the motion computed for the current time interval is extended by a braking trajectory and simulated using a physics engine. If the simulated trajectory complies with all safety constraints, the computed motion is carried out. Otherwise, the braking trajectory calculated in the previous time interval serves as an alternative safe behavior. Given a task-specific reward function, the neural network is trained using reinforcement learning. The design of the action space used for reinforcement learning ensures that all computed trajectories comply with kinematic joint limits. For our evaluation, simulated humanoid robots and industrial robots are trained to reach as many randomly placed target points as possible. We show that our method reliably prevents collisions with static obstacles and collisions between the robot arms, while generating motions that respect both torque limits and kinematic joint limits. Experiments with a real robot demonstrate that safe trajectories can be generated in real-time.
翻译:本文展示了一种方法, 用于学习在线生成无碰撞和不受反光限制的机器人轨迹。 为了生成未来运动, 定期引用神经网络 。 根据机器人当前运动状态和网络输出, 可以计算当前时间间隔的轨迹 。 本文的主要想法是, 只有在已知无碰撞和不受反光限制的方式可以继续运行轨迹时, 才会执行计算运动 。 实际上, 计算当前时间间隔的运动会通过制动轨迹延长, 并使用物理引擎模拟 。 如果模拟轨迹符合所有安全限制, 计算运动会进行 。 否则, 先前时间间隔中计算出的扭动轨迹将是一种替代性的安全行为 。 根据特定任务的奖励功能, 神经网络会通过强化学习来接受培训 。 用于强化学习的行动空间的设计确保所有计算轨迹都符合运动联合限制 。 在我们的评估中, 模拟的人体机器人和工业机器人会被训练尽可能随机地达到目标点 。 我们展示了在真实时间间隔期间所计算的轨迹轨迹轨迹轨迹轨迹 。 我们展示了我们的方法可以可靠地避免在真实的机器人上碰撞上产生 。