We present an approach to learn fast and dynamic robot motions without exceeding limits on the position $\theta$, velocity $\dot{\theta}$, acceleration $\ddot{\theta}$ and jerk $\dddot{\theta}$ of each robot joint. Movements are generated by mapping the predictions of a neural network to safely executable joint accelerations. The neural network is invoked periodically and trained via reinforcement learning. Our main contribution is an analytical procedure for calculating safe joint accelerations, which considers the prediction frequency $f_N$ of the neural network. As a result, the frequency $f_N$ can be freely chosen and treated as a hyperparameter. We show that our approach is preferable to penalizing constraint violations as it provides explicit guarantees and does not distort the desired optimization target. In addition, the influence of the selected prediction frequency on the learning performance and on the computing effort is highlighted by various experiments.
翻译:我们提出一种方法来学习快速和动态机器人运动,而不会超过对每个机器人联合体的位置的限制,即$(theta),速度$(dot),速度$(theta),加速$(dddt) 美元(trick $(dddddt) 美元(theta) 美元。运动是通过绘制神经网络的预测,以安全地执行联合加速器而产生的。神经网络通过强化学习定期被援引和培训。我们的主要贡献是计算安全联合加速器的分析程序,该程序考虑到神经网络的预测频率$f_N美元。因此,可以自由选择美元频率,并将其作为超光量计处理。我们表明,我们的方法更可取于惩罚违反限制措施的行为,因为它提供了明确的保证,而且不会扭曲理想的优化目标。此外,选定的预测频率对学习表现和计算努力的影响也得到了各种实验的强调。