Interpretability, explainability and transparency are key issues to introducing Artificial Intelligence methods in many critical domains: This is important due to ethical concerns and trust issues strongly connected to reliability, robustness, auditability and fairness, and has important consequences towards keeping the human in the loop in high levels of automation, especially in critical cases for decision making, where both (human and the machine) play important roles. While the research community has given much attention to explainability of closed (or black) prediction boxes, there are tremendous needs for explainability of closed-box methods that support agents to act autonomously in the real world. Reinforcement learning methods, and especially their deep versions, are such closed-box methods. In this article we aim to provide a review of state of the art methods for explainable deep reinforcement learning methods, taking also into account the needs of human operators - i.e., of those that take the actual and critical decisions in solving real-world problems. We provide a formal specification of the deep reinforcement learning explainability problems, and we identify the necessary components of a general explainable reinforcement learning framework. Based on these, we provide a comprehensive review of state of the art methods, categorizing them in classes according to the paradigm they follow, the interpretable models they use, and the surface representation of explanations provided. The article concludes identifying open questions and important challenges.
翻译:在许多关键领域采用人工智能方法的关键问题是可解释性、可解释性和透明度:这很重要,因为道德关切和信任问题与可靠性、稳健性、可审计性和公平性密切相关。 这对于在高度自动化,特别是在(人和机器)两方面都起着重要作用的关键决策情况下,在高度自动化中保持人际循环具有重要影响。虽然研究界非常关注封闭(或黑色)预测箱的可解释性,但非常需要解释闭盒方法,以支持代理人在现实世界中自主地采取行动。强化学习方法,特别是其深层版本,都是封闭式方法。在本篇文章中,我们力求审查可用于解释深度强化学习方法的先进方法的现状,同时也考虑到人类操作者的需求,即那些在解决现实世界问题时采取实际和关键决定的人的需要。我们正式说明了深度强化学习可解释性问题,我们确定了一个可解释的通用强化学习框架的必要组成部分。基于这些内容,我们全面审查了可解释性学习方法的状况,我们根据这些内容,对可解释性解释的方法进行了全面的审查,并提出了重要解释方法的地面解释。