通过理论建模、探索和规划进行人的强化学习 (Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning)

Reinforcement learning (RL) studies how an agent comes to achieve reward in an environment through interactions over time. Recent advances in machine RL have surpassed human expertise at the world's oldest board games and many classic video games, but they require vast quantities of experience to learn successfully -- none of today's algorithms account for the human ability to learn so many different tasks, so quickly. Here we propose a new approach to this challenge based on a particularly strong form of model-based RL which we call Theory-Based Reinforcement Learning, because it uses human-like intuitive theories -- rich, abstract, causal models of physical objects, intentional agents, and their interactions -- to explore and model an environment, and plan effectively to achieve task goals. We instantiate the approach in a video game playing agent called EMPA (the Exploring, Modeling, and Planning Agent), which performs Bayesian inference to learn probabilistic generative models expressed as programs for a game-engine simulator, and runs internal simulations over these models to support efficient object-based, relational exploration and heuristic planning. EMPA closely matches human learning efficiency on a suite of 90 challenging Atari-style video games, learning new games in just minutes of game play and generalizing robustly to new game situations and new levels. The model also captures fine-grained structure in people's exploration trajectories and learning dynamics. Its design and behavior suggest a way forward for building more general human-like AI systems.

翻译：强化学习(RL)研究代理商如何通过一段时间的相互作用在环境中获得奖励。机器RL的最近进步超过了世界最古老的棋盘游戏和许多经典电玩游戏的人类专长,但是它们需要大量经验才能成功学习 -- -- 今天的算法没有一个算法能说明人类学习如此不同任务的能力,如此快。我们在这里提出了一个应对这一挑战的新办法,它基于一种特别强大的基于模型的RL形式,我们称之为“基于理论的强化学习”,因为它使用人性相似的直观理论 -- -- 物理物体、有意的代理商及其互动的丰富、抽象、因果模型 -- -- 探索和模拟环境,并有效地规划一个环境模型,并规划如何成功地实现任务目标。我们用一个名为EMPA(探索、建模和规划代理)的视频游戏游戏代理商即时速方法,让Bayes在学习以模型模拟模拟器为表现形式的概率性基因分析模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型,并对这些模型进行内部模拟,以支持高效的物体、关系型建筑的模型探索和超动性动作规划。 EMPA在新的游戏中,在新的游戏中,在新的游戏结构中,更具有挑战性的游戏中,在新的游戏结构中,在新的游戏结构上,在新的游戏中进行新的游戏节能的游戏中,在新的游戏中,在新的游戏结构上,在新的游戏中,在新的游戏节能上,在新的游戏节能上,在新的游戏里,在新的游戏里,在新的游戏中,在新的游戏里,在新的游戏里学习新的游戏中,在新的游戏节能上进行更上,在新的游戏中,在新的游戏里学习。