Tasks where the set of possible actions depend discontinuously on the state pose a significant challenge for current reinforcement learning algorithms. For example, a locked door must be first unlocked, and then the handle turned before the door can be opened. The sequential nature of these tasks makes obtaining final rewards difficult, and transferring information between task variants using continuous learned values such as weights rather than discrete symbols can be inefficient. Our key insight is that agents that act and think symbolically are often more effective in dealing with these tasks. We propose a memory-based learning approach that leverages the symbolic nature of constraints and temporal ordering of actions in these tasks to quickly acquire and transfer high-level information. We evaluate the performance of memory-based learning on both real and simulated tasks with approximately discontinuous constraints between states and actions, and show our method learns to solve these tasks an order of magnitude faster than both model-based and model-free deep reinforcement learning methods.
翻译:一系列可能的行动不连贯地依赖于国家的任务对当前的强化学习算法构成重大挑战。 例如,必须首先打开锁门,然后在打开门之前打开把手。 这些任务的顺序性质使得获得最终奖励变得困难,并且使用持续学习的数值(如重量而不是离散符号)在任务变体之间传递信息可能效率低下。 我们的关键见解是,以象征性方式行事和思考的代理人往往更能有效完成这些任务。 我们提议一种基于记忆的学习方法,利用这些任务中制约因素的象征性质和行动时间顺序来快速获取和传输高层次信息。 我们评估在实际和模拟任务上进行基于记忆的学习的绩效,同时在州和行动之间几乎不连续的制约下进行,并展示我们如何学会以比基于模型和没有模型的深层强化学习方法更快的速度完成这些任务。</s>