Reinforcement Learning (RL) is a popular machine learning paradigm where intelligent agents interact with the environment to fulfill a long-term goal. Driven by the resurgence of deep learning, Deep RL (DRL) has witnessed great success over a wide spectrum of complex control tasks. Despite the encouraging results achieved, the deep neural network-based backbone is widely deemed as a black box that impedes practitioners to trust and employ trained agents in realistic scenarios where high security and reliability are essential. To alleviate this issue, a large volume of literature devoted to shedding light on the inner workings of the intelligent agents has been proposed, by constructing intrinsic interpretability or post-hoc explainability. In this survey, we provide a comprehensive review of existing works on eXplainable RL (XRL) and introduce a new taxonomy where prior works are clearly categorized into model-explaining, reward-explaining, state-explaining, and task-explaining methods. We also review and highlight RL methods that conversely leverage human knowledge to promote learning efficiency and performance of agents while this kind of method is often ignored in XRL field. Some challenges and opportunities in XRL are discussed. This survey intends to provide a high-level summarization of XRL and to motivate future research on more effective XRL solutions. Corresponding open source codes are collected and categorized at https://github.com/Plankson/awesome-explainable-reinforcement-learning.
翻译:强化学习(RL)是一种流行的机器学习模式,智能剂与环境互动,以实现长期目标。在深层次学习的重新抬头的驱动下,Deep RL(DRL)在一系列复杂的控制任务中取得了巨大成功。尽管取得了令人鼓舞的成果,但深神经网络骨干被广泛视为一个黑盒,它阻碍从业者在高度安全和可靠性至关重要的现实情景中信任并雇用训练有素的代理人。为了缓解这一问题,通过建立内在可解释性或后热度解释性,提出了大量文献,专门说明智能剂内部的运作情况。在这次调查中,我们全面审查了关于可扩展的RL(XRL)的现有工作,并引入了一个新的分类学,将先前的工作明确归类为示范解释、报酬解释、州解释和任务解释方法。我们还审查并突出了RL(RL)方法的开放性方法,在XRL(XRL)领域常常忽视这种方法,而这种方法往往被忽略。在XRL(XRL)领域对高层次的研究分类中,一些挑战和机会正在对XL(C)的分类进行更多的研究分类。