Offline Reinforcement Learning (RL) via Supervised Learning is a simple and effective way to learn robotic skills from a dataset collected by policies of different expertise levels. It is as simple as supervised learning and Behavior Cloning (BC), but takes advantage of return information. On datasets collected by policies of similar expertise, implicit BC has been shown to match or outperform explicit BC. Despite the benefits of using implicit models to learn robotic skills via BC, offline RL via Supervised Learning algorithms have been limited to explicit models. We show how implicit models can leverage return information and match or outperform explicit algorithms to acquire robotic skills from fixed datasets. Furthermore, we show the close relationship between our implicit methods and other popular RL via Supervised Learning algorithms to provide a unified framework. Finally, we demonstrate the effectiveness of our method on high-dimension manipulation and locomotion tasks.
翻译:通过监督学习,离线强化学习(RL)是通过不同专业水平的政策收集的数据集学习机器人技能的一个简单而有效的方法。它与监督学习和行为克隆(BC)一样简单,但利用回溯信息。在由类似专门知识的政策收集的数据集中,隐含的不列颠哥伦比亚被证明与明显的不列颠哥伦比亚相匹配或超效。尽管使用隐含模型通过不列颠哥伦比亚学习机器人技能的好处,但通过监督学习算法离线的RL局限于明确的模型。我们展示了隐含模型如何利用回报信息,匹配或超效的显性算法从固定数据集获取机器人技能。此外,我们还展示了我们通过超常学习算法获得的隐含方法与其他广受欢迎的RL之间的密切关系,以提供一个统一的框架。最后,我们展示了我们高震控操纵和移动任务的方法的有效性。