We propose a novel learning paradigm, Self-Imitation via Reduction (SIR), for solving compositional reinforcement learning problems. SIR is based on two core ideas: task reduction and self-imitation. Task reduction tackles a hard-to-solve task by actively reducing it to an easier task whose solution is known by the RL agent. Once the original hard task is successfully solved by task reduction, the agent naturally obtains a self-generated solution trajectory to imitate. By continuously collecting and imitating such demonstrations, the agent is able to progressively expand the solved subspace in the entire task space. Experiment results show that SIR can significantly accelerate and improve learning on a variety of challenging sparse-reward continuous-control problems with compositional structures. Code and videos are available at https://sites.google.com/view/sir-compositional.
翻译:我们提出一种新的学习模式,即 " 通过减少自我消化 " (SIR),以解决强化成份的学习问题。SIR基于两个核心想法:任务减少和自我模仿(SIR),任务减少(SIR)解决了难以解决的任务,积极将任务减少(LL代理商知道解决办法)减少到较容易解决的任务。任务减少(SL代理商知道解决办法)。任务减少(SIR)一旦最初的艰巨任务成功解决,该代理商自然会获得一种可以模仿的自生解决方案轨迹。通过不断收集和模仿这些演示,该代理商能够逐步扩大整个任务空间中已解决的子空间。实验结果显示,SIR能够大大加快和改进对组成结构中各种具有挑战性、可分散的连续控制问题的学习。代码和视频可以在https://sites.google.com/view/sir-compositional上查阅。