This paper presents an approach to learn online generation of collision-free and torque-limited robot trajectories. In order to generate future motions, a neural network is periodically invoked. Based on the current kinematic state of the robot and the network prediction, a trajectory for the current time interval can be calculated. The main idea of our paper is to execute the predicted motion only if a collision-free and torque-limited way to continue the trajectory is known. In practice, the motion predicted for the current time interval is extended by a braking trajectory and simulated using a physics engine. If the simulated trajectory complies with all safety constraints, the predicted motion is carried out. Otherwise, the braking trajectory calculated in the previous time interval serves as an alternative safe behavior. Given a task-specific reward function, the neural network is trained using reinforcement learning. The design of the action space used for reinforcement learning ensures that all predicted trajectories comply with kinematic joint limits. For our evaluation, simulated industrial robots and humanoid robots are trained to reach as many randomly placed target points as possible. We show that our method reliably prevents collisions with static obstacles and collisions between the robot arms, while generating motions that respect both torque limits and kinematic joint limits. Experiments with a real robot demonstrate that safe trajectories can be generated in real-time.
翻译:本文展示了一种方法来学习在线生成无碰撞和不受反光限制的机器人轨迹。 为了生成未来运动, 定期引用神经网络。 根据机器人当前运动状态和网络预测, 可以计算当前时间间隔的轨迹 。 本文的主要想法是, 只有当已知无碰撞和不受反光限制的方式可以继续轨迹时, 才能执行预测的动作 。 实际上, 预测的当前时间间隔的运动会通过制动轨迹延长, 并使用物理引擎模拟 。 如果模拟轨迹符合所有安全限制, 则会定期引用神经网络 。 否则, 之前时间间隔中计算出的扭动轨迹将是一种替代性的安全行为 。 根据特定任务的奖励功能, 我们的神经网络会通过强化学习来接受培训。 用于强化学习的行动空间的设计确保所有预测的轨迹都符合运动联合限制 。 在我们的评估中, 模拟的工业机器人和人型机器人会被训练尽可能随机地达到目标点。 我们展示了在真实时间间隔期间计算出一个真实的、 并且能可靠地防止机器人碰撞的方法, 。 我们展示了真实的机器人碰撞, 。