A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central challenges of artificial intelligence (AI) research: complex visual perception and goal-directed physical control, grounded language comprehension and production, and multi-agent social interaction. To build agents that can robustly interact with humans, we would ideally train them while they interact with humans. However, this is presently impractical. Therefore, we approximate the role of the human with another learned agent, and use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour. Rigorously evaluating our agents poses a great challenge, so we develop a variety of behavioural tests, including evaluation by humans who watch videos of agents or interact directly with them. These evaluations convincingly demonstrate that interactive training and auxiliary losses improve agent behaviour beyond what is achieved by supervised learning of actions alone. Further, we demonstrate that agent capabilities generalise beyond literal experiences in the dataset. Finally, we train evaluation models whose ratings of agents agree well with human judgement, thus permitting the evaluation of new agent models without additional effort. Taken together, our results in this virtual environment provide evidence that large-scale human behavioural imitation is a promising tool to create intelligent, interactive agents, and the challenge of reliably evaluating such agents is possible to surmount.
翻译:科幻小说的一个共同愿景是,机器人有一天会占据我们的物理空间,像我们一样感知世界,像我们一样感受世界,帮助我们的体力劳动,并通过自然语言与我们交流。在这里,我们研究如何设计人造代理人,利用简化虚拟环境与人类自然互动。然而,这种环境结合了人工智能研究的一些中心挑战:复杂的视觉认知和定向物理控制,有根有据的语言理解和制作,以及多媒介的社会互动。为了建立能够与人类进行强有力互动的代理人,我们最好在他们与人类互动的同时对他们进行培训。然而,目前这是不切实际的。因此,我们把人类的作用与另一个学到的代理人相近,并使用反强化学习的想法来缩小人与代理人互动行为之间的差距。这种环境的严格评估是一个巨大的挑战,我们开发了各种各样的行为测试,包括由那些观看代理人录像或与他们直接互动的人进行的评估。这些评估令人信服地表明,在他们与人类打交道的过程中,除了通过监督单项行动学习之外,我们还不切实际的代理人的作用之外,还利用从反向别人学习,利用反向强化学习来的想法,利用反向学习来的想法来思考的想法来思考,我们的人造型的技巧,从而把人造能的能力与行为判断力的判断力去,从而把人类的判断结果变成一种新的工具的推模。最后, 使人类的代理人的能力和行为上的推模力的推模。 使人类的推模力的推模力成为新的的推模。在人类的推模成为新的的推模。