Deep Reinforcement Learning (RL) involves the use of Deep Neural Networks (DNNs) to make sequential decisions in order to maximize reward. For many tasks the resulting sequence of actions produced by a Deep RL policy can be long and difficult to understand for humans. A crucial component of human explanations is selectivity, whereby only key decisions and causes are recounted. Imbuing Deep RL agents with such an ability would make their resulting policies easier to understand from a human perspective and generate a concise set of instructions to aid the learning of future agents. To this end we use a Deep RL agent with an episodic memory system to identify and recount key decisions during policy execution. We show that these decisions form a short, human readable explanation that can also be used to speed up the learning of naive Deep RL agents in an algorithm-independent manner.
翻译:深入强化学习(RL)涉及利用深神经网络(DNN)来做出顺序决策,以获得最大限度的回报。对于许多任务来说,由深神经网络(DNN)产生的一系列行动对于人类来说可能是长期和难以理解的。人类解释的一个关键部分是选择性,只有关键的决定和原因才会被重新叙述。具有这种能力的深神经网络(RL)代理人能够使其最终的政策更容易从人的角度理解,并产生一套简明的指示以帮助学习未来代理人。为此,我们使用一个带有偶发记忆系统的深RL代理来识别和记录政策执行期间的关键决定。我们表明,这些决定形成了一个简短的、可读的解释,也可以用来以不依赖算法的方式加速对天真的深RL代理人的学习。