Mobile Manipulation (MM) systems are ideal candidates for taking up the role of a personal assistant in unstructured real-world environments. Among other challenges, MM requires effective coordination of the robot's embodiments for executing tasks that require both mobility and manipulation. Reinforcement Learning (RL) holds the promise of endowing robots with adaptive behaviors, but most methods require prohibitively large amounts of data for learning a useful control policy. In this work, we study the integration of robotic reachability priors in actor-critic RL methods for accelerating the learning of MM for reaching and fetching tasks. Namely, we consider the problem of optimal base placement and the subsequent decision of whether to activate the arm for reaching a 6D target. For this, we devise a novel Hybrid RL method that handles discrete and continuous actions jointly, resorting to the Gumbel-Softmax reparameterization. Next, we train a reachability prior using data from the operational robot workspace, inspired by classical methods. Subsequently, we derive Boosted Hybrid RL (BHyRL), a novel algorithm for learning Q-functions by modeling them as a sum of residual approximators. Every time a new task needs to be learned, we can transfer our learned residuals and learn the component of the Q-function that is task-specific, hence, maintaining the task structure from prior behaviors. Moreover, we find that regularizing the target policy with a prior policy yields more expressive behaviors. We evaluate our method in simulation in reaching and fetching tasks of increasing difficulty, and we show the superior performance of BHyRL against baseline methods. Finally, we zero-transfer our learned 6D fetching policy with BHyRL to our MM robot TIAGo++. For more details and code release, please refer to our project site: irosalab.com/rlmmbp
翻译:移动操纵( MM) 系统是个人助理在非结构化现实环境中发挥作用的理想候选人 。 除其他挑战外, MM 需要有效地协调机器人的化身, 以便执行需要移动和操纵的任务 。 强化学习( RL) 具有将机器人与适应行为相匹配的希望, 但大多数方法都需要大量数据来学习有用的控制政策 。 在这项工作中, 我们研究将机器人可访问性前科纳入演员可访问性RL 方法, 以加速MM 的学习, 以达到和获取任务 。 也就是说, 我们考虑最佳基础定位的问题, 以及随后决定是否激活手臂以达到一个 6D 目标。 为此, 我们设计了一个新的混合性 RLL 方法, 以学习常规性动作变异性政策变异性动作 。 使用 Gumbel- Softmax 重新校验数据, 我们的操作机器人工作空间数据, 在经典方法的启发下, 我们的升级性变异性 RL( BHL ), 我们的升级性变异性政策变异性算算方法 。