Building embodied intelligent agents that can interact with 3D indoor environments has received increasing research attention in recent years. While most works focus on single-object or agent-object visual functionality and affordances, our work proposes to study a new kind of visual relationship that is also important to perceive and model -- inter-object functional relationships (e.g., a switch on the wall turns on or off the light, a remote control operates the TV). Humans often spend little or no effort to infer these relationships, even when entering a new room, by using our strong prior knowledge (e.g., we know that buttons control electrical devices) or using only a few exploratory interactions in cases of uncertainty (e.g., multiple switches and lights in the same room). In this paper, we take the first step in building AI system learning inter-object functional relationships in 3D indoor environments with key technical contributions of modeling prior knowledge by training over large-scale scenes and designing interactive policies for effectively exploring the training scenes and quickly adapting to novel test scenes. We create a new benchmark based on the AI2Thor and PartNet datasets and perform extensive experiments that prove the effectiveness of our proposed method. Results show that our model successfully learns priors and fast-interactive-adaptation strategies for exploring inter-object functional relationships in complex 3D scenes. Several ablation studies further validate the usefulness of each proposed module.
翻译:能够与 3D 室内环境互动的内含智能剂的建筑构件近年来受到越来越多的研究关注。虽然大多数工作侧重于单物件或代理物件的视觉功能和开销,但我们的工作提议研究一种新的视觉关系,这种关系对于感知和建模也很重要 -- -- 物体之间的功能关系(例如,墙上的开关开关或开关,远程控制操作电视 ) 。 人类往往很少或根本没有努力通过利用我们以前很强的知识(例如,我们知道纽扣控制电子装置)或只使用少数试探性互动来推断这些关系,即使在进入新房间时也是如此。我们利用我们先前的强项知识(例如,我们知道纽扣控制电子装置),或者在不确定的情况下(例如,同一房间的多开关和灯光),我们的工作提议研究一种新型的视觉关系。在这个文件中,我们迈出第一步,在建立AI 系统在3D 室内环境里学习相互连接的功能关系,主要的技术贡献是模拟先前的知识,通过对大型场景进行培训和设计互动政策来有效地探索培训场景,并迅速适应新的试验场景。我们根据 AI2Tho和PartNet 数据交换,在各种功能关系中进行广泛的实验中成功地学习。