Hierarchical reinforcement learning (HRL) effectively improves agents' exploration efficiency on tasks with sparse reward, with the guide of high-quality hierarchical structures (e.g., subgoals or options). However, how to automatically discover high-quality hierarchical structures is still a great challenge. Previous HRL methods can hardly discover the hierarchical structures in complex environments due to the low exploration efficiency by exploiting the randomness-driven exploration paradigm. To address this issue, we propose CDHRL, a causality-driven hierarchical reinforcement learning framework, leveraging a causality-driven discovery instead of a randomness-driven exploration to effectively build high-quality hierarchical structures in complicated environments. The key insight is that the causalities among environment variables are naturally fit for modeling reachable subgoals and their dependencies and can perfectly guide to build high-quality hierarchical structures. The results in two complex environments, 2D-Minecraft and Eden, show that CDHRL significantly boosts exploration efficiency with the causality-driven paradigm.
翻译:高等级结构(例如次级目标或备选办法)的指引,有效提高高等级结构(例如次级目标或备选办法)在微薄报酬的任务上的探索效率。然而,如何自动发现高等级结构仍是一个巨大的挑战。以往的人力资源方法很难发现复杂环境中的等级结构,因为利用随机性驱动的勘探模式,勘探效率较低。为解决这一问题,我们提议CDHRL,一个因因果关系驱动的等级强化学习框架,利用因因果关系驱动的发现,而不是随机驱动的探索,在复杂环境中有效地建立高质量的等级结构。关键的看法是,环境变量的因果关系自然适合构建可实现的次级目标及其依赖性,并且能够完美地指导建立高质量的等级结构。在两个复杂环境中,即2D-矿物和Eden,结果显示CDHRL大大提升了因因果关系驱动的范式的勘探效率。