Although deep reinforcement learning has become a promising machine learning approach for sequential decision-making problems, it is still not mature enough for high-stake domains such as autonomous driving or medical applications. In such contexts, a learned policy needs for instance to be interpretable, so that it can be inspected before any deployment (e.g., for safety and verifiability reasons). This survey provides an overview of various approaches to achieve higher interpretability in reinforcement learning (RL). To that aim, we distinguish interpretability (as a property of a model) and explainability (as a post-hoc operation, with the intervention of a proxy) and discuss them in the context of RL with an emphasis on the former notion. In particular, we argue that interpretable RL may embrace different facets: interpretable inputs, interpretable (transition/reward) models, and interpretable decision-making. Based on this scheme, we summarize and analyze recent work related to interpretable RL with an emphasis on papers published in the past 10 years. We also discuss briefly some related research areas and point to some potential promising research directions.
翻译:虽然深入强化学习已成为处理连续决策问题的有希望的机械学习方法,但对于自主驾驶或医疗应用等高层次领域来说,它仍然不够成熟,因此,在这种背景下,学习到的政策需要是可以解释的,以便可以在任何部署之前(例如安全和可核查的理由)对其进行检查;这项调查概述了在强化学习中实现更高可解释性的各种办法(RL)。为此,我们区分了可解释性(作为模式的属性)和可解释性(作为后热操作,由代理人干预),并在RL范围内讨论这些问题,重点是前一个概念。我们特别指出,可解释的RL可能包含不同的方面:可解释的投入、可解释的(过渡/评估)模式和可解释的决策。根据这个办法,我们总结并分析最近与可解释的RL有关的工作,重点是过去10年发表的论文。我们还简要讨论一些相关的研究领域,并指出一些潜在的有希望的研究方向。