In spite of the success of existing meta reinforcement learning methods, they still have difficulty in learning a meta policy effectively for RL problems with sparse reward. In this respect, we develop a novel meta reinforcement learning framework called Hyper-Meta RL(HMRL), for sparse reward RL problems. It is consisted with three modules including the cross-environment meta state embedding module which constructs a common meta state space to adapt to different environments; the meta state based environment-specific meta reward shaping which effectively extends the original sparse reward trajectory by cross-environmental knowledge complementarity and as a consequence the meta policy achieves better generalization and efficiency with the shaped meta reward. Experiments with sparse-reward environments show the superiority of HMRL on both transferability and policy learning efficiency.
翻译:尽管现有的元强化学习方法取得了成功,但他们仍然难以有效地学习一项元政策,以少许奖励的方式解决RL问题;在这方面,我们为微量奖励问题制定了称为超元强化学习框架(HMRL)的新元强化学习框架(HMRL),这个框架由三个模块组成,包括跨环境元化嵌入模块,该模块构建了一个共同的元状态空间,以适应不同环境;基于元州的环境特有元奖励塑造,它通过跨环境知识互补,有效地扩大了原始的微量奖励轨迹,结果,该元政策以塑造的元奖实现更普遍化和效率。 稀有回报环境实验表明,HMRRL在可转移性和政策学习效率两方面都具有优势。