Reinforcement learning (RL) often struggles to accomplish a sparse-reward long-horizon task in a complex environment. Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals. In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal. How to explore novel sub-goals efficiently is one of the most challenging issues in GCRL. Several goal exploration methods have been proposed to address this issue but still struggle to find the desired goals efficiently. In this paper, we propose a novel learning objective by optimizing the entropy of both achieved and new goals to be explored for more efficient goal exploration in sub-goal selection based GCRL. To optimize this objective, we first explore and exploit the frequently occurring goal-transition patterns mined in the environments similar to the current task to compose skills via skill learning. Then, the pretrained skills are applied in goal exploration. Evaluation on a variety of spare-reward long-horizon benchmark tasks suggests that incorporating our method into several state-of-the-art GCRL baselines significantly boosts their exploration efficiency while improving or maintaining their performance. The source code is available at: https://github.com/GEAPS/GEAPS.
翻译:强化学习(RL)往往在复杂的环境中为完成一个微薄的长视目标任务而苦苦挣扎,在复杂的环境中,以目标为条件的强化学习(GCRL)被用来通过一个容易达到的次级目标课程解决这一棘手问题。在GCRL中,探索新的次级目标对于代理人最终找到达到预期目标的途径至关重要。如何有效探索新的次级目标是GCRL中最具挑战性的问题之一。提出了几个目标探索方法来解决这一问题,但仍在努力有效地找到预期的目标。在本文件中,我们提出了一个创新的学习目标,即优化已实现和即将探索的新目标的增殖性,以便在基于GCRL的次级目标选择中进行更有效的目标探索。为了优化这一目标,我们首先探索和利用在与当前任务相似的环境中挖掘的目标过渡模式,以便通过技能学习来计算技能。然后,在目标探索中应用预先培训的技能。对各种备用的长期里松基准任务的评价表明,将我们的方法纳入若干州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州