Physically rearranging objects is an important capability for embodied agents. Visual room rearrangement evaluates an agent's ability to rearrange objects in a room to a desired goal based solely on visual input. We propose a simple yet effective method for this problem: (1) search for and map which objects need to be rearranged, and (2) rearrange each object until the task is complete. Our approach consists of an off-the-shelf semantic segmentation model, voxel-based semantic map, and semantic search policy to efficiently find objects that need to be rearranged. On the AI2-THOR Rearrangement Challenge, our method improves on current state-of-the-art end-to-end reinforcement learning-based methods that learn visual rearrangement policies from 0.53% correct rearrangement to 16.56%, using only 2.7% as many samples from the environment.
翻译:物理重新排列天体是隐形物剂的重要能力。 视觉室重新排列评估一个物剂将物体重新排列到一个完全基于视觉输入的预期目标的房间的能力。 我们为此提出了一个简单而有效的方法:(1) 搜索和绘制需要重新排列的物体,(2) 将每个物体重新排列到任务完成为止。 我们的方法包括一个现成的语义分解模型、 voxel 的语义图和语义搜索政策,以有效找到需要重新排列的物体。 在 AI2- THOR 重新排列挑战中,我们的方法改进了目前以最先进的端到端强化学习为基础的方法,即学习视觉重新排列政策,从0. 53% 正确重新排列到 16.56% 。 我们的方法仅使用来自环境的2. 7% 的样本。