通过模拟机械学习确定和解释强化学习政策 (Specifying and Interpreting Reinforcement Learning Policies through Simulatable Machine Learning)

Human-AI collaborative policy synthesis is a procedure in which (1) a human initializes an autonomous agent's behavior, (2) Reinforcement Learning improves the human specified behavior, and (3) the agent can explain the final optimized policy to the user. This paradigm leverages human expertise and facilitates a greater insight into the learned behaviors of an agent. Existing approaches to enabling collaborative policy specification involve black box methods which are unintelligible and are not catered towards non-expert end-users. In this paper, we develop a novel collaborative framework to enable humans to initialize and interpret an autonomous agent's behavior, rooted in principles of human-centered design. Through our framework, we enable humans to specify an initial behavior model in the form of unstructured, natural language, which we then convert to lexical decision trees. Next, we are able to leverage these human-specified policies, to warm-start reinforcement learning and further allow the agent to optimize the policies through reinforcement learning. Finally, to close the loop on human-specification, we produce explanations of the final learned policy, in multiple modalities, to provide the user a final depiction about the learned policy of the agent. We validate our approach by showing that our model can produce >80% accuracy, and that human-initialized policies are able to successfully warm-start RL. We then conduct a novel human-subjects study quantifying the relative subjective and objective benefits of varying XAI modalities(e.g., Tree, Language, and Program) for explaining learned policies to end-users, in terms of usability and interpretability and identify the circumstances that influence these measures. Our findings emphasize the need for personalized explainable systems that can facilitate user-centric policy explanations for a variety of end-users.

翻译：人类-AI合作政策合成是一个程序:(1) 人类初始化自主代理机构的行为,(2) 强化学习,改善人类特定的行为,(3) 代理机构可以向用户解释最终优化政策。这一范例利用了人类专门知识,便于更深入地了解代理人的学习行为。现有的促进协作政策规格的方法包括黑盒方法,这些方法不易理解,而且没有为非专家最终用户服务。在本文件中,我们开发了一个新的协作框架,使人类能够初始化和解释自主代理机构的行为,这种行为植根于以人为中心的设计原则。我们通过我们的框架,使人类能够以非结构、自然语言的形式向用户解释初始行为模式,然后我们将其转换为法性决策树。接下来,我们有能力利用这些人类指定的政策,启动强化学习,进一步使代理机构能够通过强化学习优化政策。最后,为了结束关于人类具体程度的循环,我们以多种方式对最终学习的逻辑和逻辑性政策做出解释, 向用户提供最后的模型描述, 以非结构的逻辑性解释, 并且为我们所学的逻辑的正确性解释。我们验证了我们所学的系统, 能够展示了我们所学的逻辑- 的逻辑- 解释, 向人类的逻辑的流程的流程的最终的逻辑解释,我们可以向我们的逻辑解释,我们可以展示的逻辑解释。