While reinforcement learning (RL) algorithms have been successfully applied to numerous tasks, their reliance on neural networks makes their behavior difficult to understand and trust. Counterfactual explanations are human-friendly explanations that offer users actionable advice on how to alter the model inputs to achieve the desired output from a black-box system. However, current approaches to generating counterfactuals in RL ignore the stochastic and sequential nature of RL tasks and can produce counterfactuals which are difficult to obtain or do not deliver the desired outcome. In this work, we propose RACCER, the first RL-specific approach to generating counterfactual explanations for the behaviour of RL agents. We first propose and implement a set of RL-specific counterfactual properties that ensure easily reachable counterfactuals with highly-probable desired outcomes. We use a heuristic tree search of agent's execution trajectories to find the most suitable counterfactuals based on the defined properties. We evaluate RACCER in two tasks as well as conduct a user study to show that RL-specific counterfactuals help users better understand agent's behavior compared to the current state-of-the-art approaches.
翻译:强化学习(RL)算法虽然成功地应用于许多任务,但对神经网络的依赖使其行为难以理解和信任。反事实解释是人类友好的解释,它为用户提供可操作的建议,说明如何改变模型输入,以达到黑箱系统的预期输出。然而,目前产生反事实的方法在RL中忽略了RL任务的随机性和顺序性质,并能够产生难以获得或不产生预期结果的最合适的反事实。在这项工作中,我们提议RACCER,这是为RL代理人的行为提供反事实解释的第一个具体RL方法。我们首先提出并实施一套针对RL的具体反事实特性,以确保用户能够以高度可行的预期结果更容易达到的反事实。我们用对代理人执行轨迹的超树搜索,以找到基于所定义的属性的最合适的反事实。我们评估了两项任务中的RACCER,并进行用户研究,以显示RL的具体反事实方法有助于用户更好地了解代理人与当前状态相比的行为。</s>