Understanding emerging behaviors of reinforcement learning (RL) agents may be difficult since such agents are often trained in complex environments using highly complex decision making procedures. This has given rise to a variety of approaches to explainability in RL that aim to reconcile discrepancies that may arise between the behavior of an agent and the behavior that is anticipated by an observer. Most recent approaches have relied either on domain knowledge that may not always be available, on an analysis of the agent's policy, or on an analysis of specific elements of the underlying environment, typically modeled as a Markov Decision Process (MDP). Our key claim is that even if the underlying model is not fully known (e.g., the transition probabilities have not been accurately learned) or is not maintained by the agent (i.e., when using model-free methods), the model can nevertheless be exploited to automatically generate explanations. For this purpose, we suggest using formal MDP abstractions and transforms, previously used in the literature for expediting the search for optimal policies, to automatically produce explanations. Since such transforms are typically based on a symbolic representation of the environment, they can provide meaningful explanations for gaps between the anticipated and actual agent behavior. We formally define the explainability problem, suggest a class of transforms that can be used for explaining emergent behaviors, and suggest methods that enable efficient search for an explanation. We demonstrate the approach on a set of standard benchmarks.
翻译:理解加强学习代理人(RL)的新兴行为可能很困难,因为这些代理人往往在复杂的环境中使用高度复杂的决策程序来训练,从而导致采取各种办法来解释该代理人的行为与观察员预期的行为之间可能出现的差异,目的是调和代理人行为与观察员预期的行为之间可能出现的差异。最近的办法大多依靠不一定具备的域知识,分析代理人的政策,或分析通常以马克夫决策程序(MDP)为模范,对基本环境的具体要素进行分析。我们的主要主张是,即使基础模型尚未完全为人所熟知(例如,过渡概率没有准确了解),或代理人没有坚持(例如,使用不使用模式的方法时),也可以利用该模型自动作出解释。为此,我们建议使用正式的MDP抽象和变换,即以前在文献中用来加快寻找最佳政策的方法,自动作出解释。由于这种变换通常以环境的象征性表述为基础,因此,它们可以提供有意义的解释,说明在预期的和实际代理人行为之间出现的差距,我们可以提供有意义的解释。我们可以正式地解释所预期的、用来解释的标准解释。