The ability for robots to transfer their learned knowledge to new tasks -- where data is scarce -- is a fundamental challenge for successful robot learning. While fine-tuning has been well-studied as a simple but effective transfer approach in the context of supervised learning, it is not as well-explored in the context of reinforcement learning. In this work, we study the problem of fine-tuning in transfer reinforcement learning when tasks are parameterized by their reward functions, which are known beforehand. We conjecture that fine-tuning drastically underperforms when source and target trajectories are part of different homotopy classes. We demonstrate that fine-tuning policy parameters across homotopy classes compared to fine-tuning within a homotopy class requires more interaction with the environment, and in certain cases is impossible. We propose a novel fine-tuning algorithm, Ease-In-Ease-Out fine-tuning, that consists of a relaxing stage and a curriculum learning stage to enable transfer learning across homotopy classes. Finally, we evaluate our approach on several robotics-inspired simulated environments and empirically verify that the Ease-In-Ease-Out fine-tuning method can successfully fine-tune in a sample-efficient way compared to existing baselines.
翻译:机器人将学到的知识传授给新任务 -- -- 数据稀少 -- -- 的能力是成功机器人学习的基本挑战。微调作为受监督学习的简单而有效的转让方法,已经很好地加以研究,在受监督学习的范围内,微调作为一种简单而有效的转让方法,但在强化学习的范围内,它并没有很好地探索。在这项工作中,我们研究在任务以其奖赏功能为参数(事先就已知道的奖励功能)时,如何微调传授强化学习的问题。我们推测,当源和目标轨迹是不同同质类的一部分时,微调会大大地不完善。我们证明,在同质学习阶段内,微调同质调班的政策参数比微调更需要与环境进行更多的互动,在某些情况下是不可能的。我们提出一个新的微调算法,即Ease-In-Ease-out 微调,包括一个放松阶段和课程学习阶段,以便能够在同质级之间转移学习。最后,我们评估我们在若干机器人激励的模拟环境中采用的方法,以及实验性地核查Ease-In-exutimal roduction avidual viduction viduction viduft viduction viduction viduction viduft viduction viduction rogy