Many robotic path planning problems are continuous, stochastic, and high-dimensional. The ability of a mobile manipulator to coordinate its base and manipulator in order to control its whole-body online is particularly challenging when self and environment collision avoidance is required. Reinforcement Learning techniques have the potential to solve such problems through their ability to generalise over environments. We study joint penalties and joint limits of a state-of-the-art mobile manipulator whole-body controller that uses LIDAR sensing for obstacle collision avoidance. We propose directions to improve the reinforcement learning method. Our agent achieves significantly higher success rates than the baseline in a goal-reaching environment and it can solve environments that require coordinated whole-body control which the baseline fails.
翻译:许多机器人路径规划问题是连续的、随机的和高维的。移动操纵者协调其基地和操控器以便控制其整个身体在线的能力在需要避免自我和环境碰撞时特别具有挑战性。强化学习技术有可能通过其对环境的普及能力解决这些问题。我们研究使用LIDAR感测来避免碰撞的先进移动操纵器全机控制器的联合处罚和联合限制。我们提出了改进强化学习方法的方向。我们的代理人在目标环境里比基线成功率高得多,它能够解决需要协调整个身体控制的环境,而基准却失败了。