Despite of achieving great success in real life, Deep Reinforcement Learning (DRL) is still suffering from three critical issues, which are data efficiency, lack of the interpretability and transferability. Recent research shows that embedding symbolic knowledge into DRL is promising in addressing those challenges. Inspired by this, we introduce a novel deep reinforcement learning framework with symbolic options. This framework features a loop training procedure, which enables guiding the improvement of policy by planning with action models and symbolic options learned from interactive trajectories automatically. The learned symbolic options alleviate the dense requirement of expert domain knowledge and provide inherent interpretability of policies. Moreover, the transferability and data efficiency can be further improved by planning with the action models. To validate the effectiveness of this framework, we conduct experiments on two domains, Montezuma's Revenge and Office World, respectively. The results demonstrate the comparable performance, improved data efficiency, interpretability and transferability.
翻译:尽管在现实生活中取得了巨大成功,但深强化学习(DRL)仍然受到三个关键问题的困扰,即数据效率、缺乏可解释性和可转移性。最近的研究表明,将象征性知识纳入DRL在应对这些挑战方面是很有希望的。受此启发,我们引入了一个新的深强化学习框架,有象征性的选择。这个框架有一个循环培训程序,通过规划行动模式和自动从互动轨迹中学习的象征性选择来指导政策的改进。学习的象征性选择减轻了对专家领域知识的密集要求,提供了政策的内在可解释性。此外,通过与行动模式进行规划,可以进一步改进可转移性和数据效率。为了验证这一框架的有效性,我们分别在蒙特祖马的Revenge和Office World两个领域进行了实验。结果显示了可比较的业绩、改进的数据效率、可解释性和可转移性。