DeepSynth: 深强化学习中自动任务分割的自动合成 (DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning)

This paper proposes DeepSynth, a method for effective training of deep Reinforcement Learning (RL) agents when the reward is sparse and non-Markovian, but at the same time progress towards the reward requires achieving an unknown sequence of high-level objectives. Our method employs a novel algorithm for synthesis of compact automata to uncover this sequential structure automatically. We synthesise a human-interpretable automaton from trace data collected by exploring the environment. The state space of the environment is then enriched with the synthesised automaton so that the generation of a control policy by deep RL is guided by the discovered structure encoded in the automaton. The proposed approach is able to cope with both high-dimensional, low-level features and unknown sparse non-Markovian rewards. We have evaluated DeepSynth's performance in a set of experiments that includes the Atari game Montezuma's Revenge. Compared to existing approaches, we obtain a reduction of two orders of magnitude in the number of iterations required for policy synthesis, and also a significant improvement in scalability.

翻译：本文提出深强化学习(RL)代理器的有效培训方法DeepSynth, 这是一种在奖赏稀少、非马尔科文时有效培训深强化学习(RL)代理器的方法, 但与此同时, 奖赏的进展需要达到一个未知的高层目标序列。我们的方法是使用一个新型的集成自动自动成像集集集集集集集集, 来自动发现这一相继结构。我们从通过探索环境收集的微量数据中合成出一个人类解析的自动图案。然后, 以合成的自动图集丰富了环境空间, 使得深RL制定的控制政策以在自动图集中发现的编码结构为指导。提议的方法既能应对高维、低度特性,又能应付未知的稀有的非马尔科文奖状。我们评估了DeepSynth在一系列实验中的性能, 其中包括Atari游戏 Monezuma的Revenge。与现有方法相比, 我们减少了政策综合所需的迭代数的两级级级, 也大大改进了可伸缩性。

相关内容

深度强化学习

关注 154

深度强化学习 (DRL) 是一种使用深度学习技术扩展传统强化学习方法的一种机器学习方法。传统强化学习方法的主要任务是使得主体根据从环境中获得的奖赏能够学习到最大化奖赏的行为。然而，传统无模型强化学习方法需要使用函数逼近技术使得主体能够学习出值函数或者策略。在这种情况下，深度学习强大的函数逼近能力自然成为了替代人工指定特征的最好手段并为性能更好的端到端学习的实现提供了可能。

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日