与现场人进行关于内装导航工具的自然手法的交流学习 (Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene)

Human-robot collaboration is an essential research topic in artificial intelligence (AI), enabling researchers to devise cognitive AI systems and affords an intuitive means for users to interact with the robot. Of note, communication plays a central role. To date, prior studies in embodied agent navigation have only demonstrated that human languages facilitate communication by instructions in natural languages. Nevertheless, a plethora of other forms of communication is left unexplored. In fact, human communication originated in gestures and oftentimes is delivered through multimodal cues, e.g. "go there" with a pointing gesture. To bridge the gap and fill in the missing dimension of communication in embodied agent navigation, we propose investigating the effects of using gestures as the communicative interface instead of verbal cues. Specifically, we develop a VR-based 3D simulation environment, named Ges-THOR, based on AI2-THOR platform. In this virtual environment, a human player is placed in the same virtual scene and shepherds the artificial agent using only gestures. The agent is tasked to solve the navigation problem guided by natural gestures with unknown semantics; we do not use any predefined gestures due to the diversity and versatile nature of human gestures. We argue that learning the semantics of natural gestures is mutually beneficial to learning the navigation task--learn to communicate and communicate to learn. In a series of experiments, we demonstrate that human gesture cues, even without predefined semantics, improve the object-goal navigation for an embodied agent, outperforming various state-of-the-art methods.

翻译：人类机器人合作是人工智能(AI)中的一个基本研究课题,使研究人员能够设计认知AI系统,并为用户与机器人互动提供直觉手段。值得注意的是,通信起着中心作用。迄今为止,对内装代理导航的先前研究仅表明,人类语言通过自然语言指令促进通信。然而,大量其他形式的通信形式尚未探索。事实上,人类通信源自于手势,而且往往通过多式联运提示传递,例如“去那里”带有指针。为了填补在内装代理导航中缺少的通信层面,我们建议调查使用手势作为通信界面而不是口头提示的效果。具体地说,我们开发了一个基于VR的3D模拟环境,以自然语言命名为Ges-THOR,在这个虚拟环境中,一个人类玩家被放在同一个虚拟场前端,并且只用一种手势来引导人工代理人。该代理人的任务是用未知的手势来解决导航问题;我们不使用未知的内装代理人,我们不使用任何手势的手势,我们不使用任何手势的手势来学习一种手势。