The object manipulation is a crucial ability for a service robot, but it is hard to solve with reinforcement learning due to some reasons such as sample efficiency. In this paper, to tackle this object manipulation, we propose a novel framework, AP-NPQL (Non-Parametric Q Learning with Action Primitives), that can efficiently solve the object manipulation with visual input and sparse reward, by utilizing a non-parametric policy for reinforcement learning and appropriate behavior prior for the object manipulation. We evaluate the efficiency and the performance of the proposed AP-NPQL for four object manipulation tasks on simulation (pushing plate, stacking box, flipping cup, and picking and placing plate), and it turns out that our AP-NPQL outperforms the state-of-the-art algorithms based on parametric policy and behavior prior in terms of learning time and task success rate. We also successfully transfer and validate the learned policy of the plate pick-and-place task to the real robot in a sim-to-real manner.
翻译:物体操纵是服务机器人的关键能力,但由于样本效率等一些原因,很难通过强化学习来解决,但由于一些原因,例如样本效率等,很难解决物体操纵问题。在本文中,为了解决这种物体操纵问题,我们提出了一个新的框架,即AP-NPQL(非光学Q学习与动作精华),它可以通过视觉输入和微薄的奖励,利用非光学政策来有效解决物体操纵问题,在物体操纵之前,利用非光学政策来强化学习和适当行为。我们评估了拟议AP-NPQL在模拟(推动板、堆叠盒、翻转杯、选和放置板块)的四个物体操纵任务(推动盘、堆叠、翻转杯、以及选和放置板块)方面的效率和表现。 事实证明,我们的AP-NPQL在学习时间和任务成功率方面,根据参数政策和先前的行为,超越了最先进的算法。 我们还成功地以模拟方式向真正的机器人转移和验证了所学过的板选任务的政策。