We address the problem of efficient exploration for transition model learning in the relational model-based reinforcement learning setting without extrinsic goals or rewards. Inspired by human curiosity, we propose goal-literal babbling (GLIB), a simple and general method for exploration in such problems. GLIB samples relational conjunctive goals that can be understood as specific, targeted effects that the agent would like to achieve in the world, and plans to achieve these goals using the transition model being learned. We provide theoretical guarantees showing that exploration with GLIB will converge almost surely to the ground truth model. Experimentally, we find GLIB to strongly outperform existing methods in both prediction and planning on a range of tasks, encompassing standard PDDL and PPDDL planning benchmarks and a robotic manipulation task implemented in the PyBullet physics simulator. Video: https://youtu.be/F6lmrPT6TOY Code: https://git.io/JIsTB
翻译:我们在人类好奇心的启发下,提出一种简单和一般的探讨方法,即目标-直截了当(GLIB),作为探讨此类问题的一种简单和一般的方法。全球LIB取样了可以被理解为代理人希望在全世界实现的具体、有针对性的效果的关系结合目标,并计划利用正在学习的过渡模式实现这些目标。我们提供了理论保证,表明与GLIB的探索将几乎肯定地与地面真相模型汇合。我们试验性地发现GLIB在预测和规划一系列任务方面大大超越现有方法,包括标准的PDDL和PPDDL规划基准,以及在PyBullet物理模拟器中执行的机器人操纵任务。视频:https://youtu.be/F6lrPT6TOY代码:https://git.io/JISTB。