Applications of Reinforcement Learning (RL) in robotics are often limited by high data demand. On the other hand, approximate models are readily available in many robotics scenarios, making model-based approaches like planning a data-efficient alternative. Still, the performance of these methods suffers if the model is imprecise or wrong. In this sense, the respective strengths and weaknesses of RL and model-based planners are. In the present work, we investigate how both approaches can be integrated into one framework that combines their strengths. We introduce Learning to Execute (L2E), which leverages information contained in approximate plans to learn universal policies that are conditioned on plans. In our robotic manipulation experiments, L2E exhibits increased performance when compared to pure RL, pure planning, or baseline methods combining learning and planning.
翻译:在机器人中应用强化学习(RL)往往受到高数据需求的限制;另一方面,在很多机器人假设情景中,近似模型很容易获得,使基于模型的办法如规划数据效率高的替代方法,但是,如果模型不准确或错误,这些方法的性能仍然受到影响;从这个意义上讲,RL和基于模型的规划者各自的长处和弱点。在目前的工作中,我们调查两种方法如何能融入一个将优势结合起来的框架。我们引入了“向执行学习”(L2E),利用大约计划所包含的信息学习以计划为条件的普遍政策。在我们的机器人操作实验中,L2E展览提高了绩效,与纯RL、纯规划或将学习与规划相结合的基线方法相比。