Towards bridging the gap between machine and human intelligence, it is of utmost importance to introduce environments that are visually realistic and rich in content. In such environments, one can evaluate and improve a crucial property of practical intelligent systems, namely \emph{generalization}. In this work, we build \emph{House3D}, a rich, extensible and efficient environment that contains 45,622 human-designed 3D scenes of houses, ranging from single-room studios to multi-storeyed houses, equipped with a diverse set of fully labeled 3D objects, textures and scene layouts, based on the SUNCG dataset (Song et al., 2017). With an emphasis on semantic-level generalization, we study the task of concept-driven navigation, \emph{RoomNav}, using a subset of houses in House3D. In RoomNav, an agent navigates towards a target specified by a semantic concept. To succeed, the agent learns to comprehend the scene it lives in by developing perception, understand the concept by mapping it to the correct semantics, and navigate to the target by obeying the underlying physical rules. We train RL agents with both continuous and discrete action spaces and show their ability to generalize in new unseen environments. In particular, we observe that (1) training is substantially harder on large house sets but results in better generalization, (2) using semantic signals (e.g., segmentation mask) boosts the generalization performance, and (3) gated networks on semantic input signal lead to improved training performance and generalization. We hope House3D, including the analysis of the RoomNav task, serves as a building block towards designing practical intelligent systems and we wish it to be broadly adopted by the community.
翻译:在缩小机器和人类智能之间的差距方面,至关重要的是要引入具有视觉现实和内容丰富的视觉现实环境。 在这种环境中,人们可以评估并改进实用智能系统的关键属性,即 emph{House3D}。 在这项工作中,我们建造了 emph{House3D},这是一个丰富、可扩展和高效的环境,包含45,622个人类设计的3D房子,从单间工作室到多层楼房,配有一套贴满标签的3D对象、纹理和场景布局。 在SONCG数据集(Song et al.,2017)的基础上,我们可以评估并改进实用智能系统的关键属性。 在强调语义化层面,我们研究概念驱动的导航任务, emph{RoomNav}, 使用House3D的一组房子。 在室内,一个代理人走向一个由语义概念指定的目标。 成功,该代理人学会通过发展认知来理解这个场景,了解这个概念,通过将它映射成正确的智能网络(Song et al) elizal sereal comalal exal exal exalalation, lady lady lady lady lady the lady lady lady lady.