Learning to play table tennis is a challenging task for robots, due to the variety of the strokes required. Current advances in deep Reinforcement Learning (RL) have shown potential in learning the optimal strokes. However, the large amount of exploration still limits the applicability when utilizing RL in real scenarios. In this paper, we first propose a realistic simulation environment where several models are built for the ball's dynamics and the robot's kinematics. Instead of training an end-to-end RL model, we decompose it into two stages: the ball's hitting state prediction and consequently learning the racket strokes from it. A novel policy gradient approach with TD3 backbone is proposed for the second stage. In the experiments, we show that the proposed approach significantly outperforms the existing RL methods in simulation. To cross the domain from simulation to reality, we develop an efficient retraining method and test in three real scenarios with a success rate of 98%.
翻译:学会玩桌球对于机器人来说是一项艰巨的任务,因为需要的中风种类繁多。 深入强化学习(RL)目前的进展显示在学习最佳中风方面的潜力。 然而,大量探索仍然限制了在真实情景中运用RL的实用性。 在本文中, 我们首先提出一个现实的模拟环境, 在其中为球的动态和机器人的运动学建立数个模型。 我们不训练端到端的RL模型,而是将其分为两个阶段: 球的打击状态预测, 从而从中学习电击。 在第二阶段, 提出了一种带有TD3骨的新型政策梯度方法。 在实验中, 我们显示, 拟议的方法在模拟中大大超越了现有的RL方法。 为了从模拟到现实, 我们开发了高效的再培训方法, 在三种真实情景中测试, 成功率为98% 。