Reinforcement learning (RL) enables an agent to learn from trial-and-error experiences toward achieving long-term goals; automated planning aims to compute plans for accomplishing tasks using action knowledge. Despite their shared goal of completing complex tasks, the development of RL and automated planning has been largely isolated due to their different computational modalities. Focusing on improving RL agents' learning efficiency, we develop Guided Dyna-Q (GDQ) to enable RL agents to reason with action knowledge to avoid exploring less-relevant states. The action knowledge is used for generating artificial experiences from an optimistic simulation. GDQ has been evaluated in simulation and using a mobile robot conducting navigation tasks in a multi-room office environment. Compared with competitive baselines, GDQ significantly reduces the effort in exploration while improving the quality of learned policies.
翻译:强化学习(RL)使代理商能够从试验和操作经验中学习,从而实现长期目标;自动化规划旨在利用行动知识计算完成任务的计划;尽管他们有完成复杂任务的共同目标,但开发RL和自动化规划由于不同的计算方式,在很大程度上是孤立的。我们注重提高RL代理商的学习效率,开发Dyna-Q(GDQ),使RL代理商能够以行动知识来解释,避免探索不相干的国家。行动知识被用于从乐观模拟中产生人工经验。GDQ在模拟中和使用移动机器人在多房间办公环境中执行导航任务时接受了评价。与竞争性基线相比,GDQ大大降低了探索努力,同时提高了学习政策的质量。