Reinforcement Learning (RL) is a widely-used technique in many domains, including autonomous driving, robotics, stock trading, and video games. Unfortunately, the black box nature of RL agents, combined with increasing legal and ethical considerations, makes it increasingly important that humans understand the reasoning behind the actions taken by an RL agent, particularly in safety-critical domains. To help address this challenge, we introduce PolicyExplainer, a visual analytics interface which lets the user directly query an RL agent. PolicyExplainer visualizes the states, policy, and expected future rewards for an agent, and supports asking and answering questions such as: "Why take this action? Why not this other action? When is this action taken?". PolicyExplainer is designed based upon a domain analysis with RL experts, and is evaluated via empirical assessments on a trio of domains: taxi navigation, an inventory application, and the safety-critical domain of drug recommendation for HIV patients.
翻译:强化学习(RL)是许多领域广泛使用的一种技术,包括自主驾驶、机器人、股票交易和视频游戏。 不幸的是,RL代理的黑匣子性质,加上越来越多的法律和道德考虑,使得人类越来越需要理解RL代理所采取行动背后的推理,特别是在安全关键领域。为了帮助应对这一挑战,我们引入了政策专家,这是一个视觉分析界面,让用户能够直接查询RL代理。政策专家将州、政策和预期的代理人未来奖赏进行视觉化,并支持询问和回答问题,比如:“为什么采取这一行动?为什么不采取这一其他行动?何时采取这一行动?” 。 政策专家是在与RL专家进行域分析的基础上设计的,通过对三个领域的经验评估进行评估:出租车导航、盘点应用和艾滋病毒病人安全关键药物建议领域。