为什么为什么为什么什么时候强化学习中代理行为行为的视觉解释 (Why? Why not? When? Visual Explanations of Agent Behavior in Reinforcement Learning)

Reinforcement learning (RL) is used in many domains, including autonomous driving, robotics, stock trading, and video games. Unfortunately, the black box nature of RL agents, combined with legal and ethical considerations, makes it increasingly important that humans (including those are who not experts in RL) understand the reasoning behind the actions taken by an RL agent, particularly in safety-critical domains. To help address this challenge, we introduce PolicyExplainer, a visual analytics interface which lets the user directly query an autonomous agent. PolicyExplainer visualizes the states, policy, and expected future rewards for an agent, and supports asking and answering questions such as: Why take this action? Why not take this other action? When is this action taken? PolicyExplainer is designed based upon a domain analysis with RL researchers, and is evaluated via qualitative and quantitative assessments on a trio of domains: taxi navigation, a stack bot domain, and drug recommendation for HIV patients. We find that PolicyExplainer promotes trust and understanding of agent decisions better than a state-of-the-art text-based explanation approach. Interviews with domain practitioners provide further validation for PolicyExplainer as applied to safety-critical domains. Our results help demonstrate how visualization-based approaches can be leveraged to decode the behavior of autonomous RL agents, particularly for RL non-experts.

翻译：强化学习(RL)用于许多领域,包括自主驾驶、机器人、股票交易和视频游戏。不幸的是,RL代理商的黑盒性质,加上法律和道德方面的考虑,使得人类(包括不是RL专家的人)越来越有必要理解RL代理商所采取行动背后的推理,特别是在安全关键领域。为了帮助应对这一挑战,我们引入了政策专家,这是一个视觉分析界面,用户可以直接询问自主代理商。政策专家将国家、政策和预期未来对代理人的奖励形象化,支持询问和回答问题,例如:为什么采取这一行动?为什么不采取其他行动?当采取这一行动时?政策专家是根据与RL研究人员的域分析设计的,并通过对领域三方面进行定性和定量评估来进行评估:出租车导航、堆肥和对艾滋病毒病人的药物建议。我们发现,政策专家促进人们对代理商决定的信任和理解比基于国家、基于文本的解释方法的解释更好,支持询问和回答问题:为什么不采取这一行动?为什么不采取这种行动?政策专家不采取其他行动? 政策专家根据与RL研究人员的域分析设计设计设计,如何进一步进行风险评估。