评估人类在虚拟现实中的相互作用,以强化学习的价值观为基础,利用不断学习的预测工具评估人类的相互作用:试点研究 (Assessing Human Interaction in Virtual Reality With Continually Learning Prediction Agents Based on Reinforcement Learning Algorithms: A Pilot Study)

Dylan J. A. Brenneis,Adam S. Parker,Michael Bradley Johanson,Andrew Butcher,Elnaz Davoodi,Leslie Acker,Matthew M. Botvinick,Joseph Modayil,Adam White,Patrick M. Pilarski

Artificial intelligence systems increasingly involve continual learning to enable flexibility in general situations that are not encountered during system training. Human interaction with autonomous systems is broadly studied, but research has hitherto under-explored interactions that occur while the system is actively learning, and can noticeably change its behaviour in minutes. In this pilot study, we investigate how the interaction between a human and a continually learning prediction agent develops as the agent develops competency. Additionally, we compare two different agent architectures to assess how representational choices in agent design affect the human-agent interaction. We develop a virtual reality environment and a time-based prediction task wherein learned predictions from a reinforcement learning (RL) algorithm augment human predictions. We assess how a participant's performance and behaviour in this task differs across agent types, using both quantitative and qualitative analyses. Our findings suggest that human trust of the system may be influenced by early interactions with the agent, and that trust in turn affects strategic behaviour, but limitations of the pilot study rule out any conclusive statement. We identify trust as a key feature of interaction to focus on when considering RL-based technologies, and make several recommendations for modification to this study in preparation for a larger-scale investigation. A video summary of this paper can be found at https://youtu.be/oVYJdnBqTwQ .

翻译：人类与自主系统的互动是广泛研究的,但研究迄今尚未充分探讨在系统积极学习期间发生的相互作用,并可以以几分钟内明显改变其行为。在这项试点研究中,我们调查人类与持续学习的预测剂之间的互动如何随着代理人发展能力而发展。此外,我们比较了两种不同的代理结构,以评估代理设计中的代表性选择如何影响人体与代理人的互动。我们开发了一个虚拟现实环境和基于时间的预测任务,从强化学习(RL)算法中学习到的预测会增强人类的预测。我们利用定量和定性分析,评估了参与者在这项任务中的表现和行为在各种代理类型上的差异。我们的研究结果表明,与代理人早期互动可能影响到人类对系统的信任,而信任反过来会影响战略行为,但试点研究排除了任何结论性说明的局限性。我们确定信任是考虑基于RL的技术时着重互动的一个关键特征,并在为准备更大尺度的MAT/V摘要时对这项研究提出若干修改建议。