Recently, reinforcement learning has allowed dexterous manipulation skills with increasing complexity. Nonetheless, learning these skills in simulation still exhibits poor sample-efficiency which stems from the fact these skills are learned from scratch without the benefit of any domain expertise. In this work, we aim to improve the sample-efficiency of learning dexterous in-hand manipulation skills using sub-optimal controllers available via domain knowledge. Our framework optimally queries the sub-optimal controllers and guides exploration toward state-space relevant to the task thereby demonstrating improved sample complexity. We show that our framework allows learning from highly sub-optimal controllers and we are the first to demonstrate learning hard-to-explore finger-gaiting in-hand manipulation skills without the use of an exploratory reset distribution.
翻译:近来,强化学习使得灵活操作技能变得日益复杂。 尽管如此,在模拟中学习这些技能仍然显示,由于这些技能是从零到零学习而没有利用任何领域的专门知识,因此样本效率低下。在这项工作中,我们的目标是利用通过域知识提供的亚最佳控制器提高学习灵活操作技能的样本效率。我们的框架最理想地询问次最佳控制器,并指导探索到与任务相关的州空间,从而显示样本复杂性的提高。我们显示,我们的框架允许从高度次最佳控制器中学习,我们是第一个在不使用探索性重置分布的情况下学习难以挖掘的手动手指操纵技能的样本。</s>