In recent years, advances in deep learning have resulted in a plethora of successes in the use of reinforcement learning (RL) to solve complex sequential decision tasks with high-dimensional inputs. However, existing systems lack the necessary mechanisms to provide humans with a holistic view of their competence, presenting an impediment to their adoption, particularly in critical applications where the decisions an agent makes can have significant consequences. Yet, existing RL-based systems are essentially competency-unaware in that they lack the necessary interpretation mechanisms to allow human operators to have an insightful, holistic view of their competency. In this paper, we extend a recently-proposed framework for explainable RL that is based on analyses of "interestingness." Our new framework provides various measures of RL agent competence stemming from interestingness analysis and is applicable to a wide range of RL algorithms. We also propose novel mechanisms for assessing RL agents' competencies that: 1) identify agent behavior patterns and competency-controlling conditions by clustering agent behavior traces solely using interestingness data; and 2) identify the task elements mostly responsible for an agent's behavior, as measured through interestingness, by performing global and local analyses using SHAP values. Overall, our tools provide insights about RL agent competence, both their capabilities and limitations, enabling users to make more informed decisions about interventions, additional training, and other interactions in collaborative human-machine settings.
翻译:近些年来,深层次学习的进展导致在利用强化学习(RL)解决复杂的连续决策任务方面取得了大量的成功,然而,现有系统缺乏必要的机制,无法向人类提供对其能力的全面认识,妨碍其采用,特别是在一个代理做出的决定可能产生严重后果的关键应用中。然而,现有基于RL的系统基本上缺乏能力,因为它们缺乏必要的解释机制,无法让人类操作者对其能力有洞察力的全面了解。在本文件中,我们根据对“兴趣”的分析,扩展了最近提出的可解释的RL框架。我们的新框架提供了各种来自有趣的分析的RL代理能力措施,并适用于广泛的RL算法。我们还提出了评估RL代理能力的新机制,即:(1) 仅仅利用有趣的数据,通过组合代理者行为痕迹,确定代理者行为的主要任务要素;以及(2) 通过利用SHAP的价值观,进行全球和地方分析,通过进行有趣的分析,我们的新框架提供了各种RL代理者的能力,我们的工具提供了其他关于互动能力的深入认识。