Virtualizing the physical world into virtual models has been a critical technique for robot navigation and planning in the real world. To foster manipulation with articulated objects in everyday life, this work explores building articulation models of indoor scenes through a robot's purposeful interactions in these scenes. Prior work on articulation reasoning primarily focuses on siloed objects of limited categories. To extend to room-scale environments, the robot has to efficiently and effectively explore a large-scale 3D space, locate articulated objects, and infer their articulations. We introduce an interactive perception approach to this task. Our approach, named Ditto in the House, discovers possible articulated objects through affordance prediction, interacts with these objects to produce articulated motions, and infers the articulation properties from the visual observations before and after each interaction. It tightly couples affordance prediction and articulation inference to improve both tasks. We demonstrate the effectiveness of our approach in both simulation and real-world scenes. Code and additional results are available at https://ut-austin-rpl.github.io/HouseDitto/
翻译:将物理世界虚拟化为虚拟模型是机器人在现实世界中导航和规划的关键技术。为了在日常生活中促进对清晰物体的操控,这项工作探索通过机器人在这些场景中有目的的相互作用来建立室内场景的清晰模型。先前的清晰推理工作主要侧重于有限类别的孤立物体。为了将范围扩大到室内环境,机器人必须高效率和有成效地探索大型的3D空间,定位清晰的物体,并推断其特征。我们为此任务引入了一种互动的感知方法。我们称为Ditto(Ditto)的方法通过负担预测发现可能的清晰物体,与这些物体互动以产生清晰的动作,并从每次互动前后的视觉观察中推断出表达的特征。密切的夫妇承担了预测和清晰推理,以改善这两个任务。我们在模拟和现实世界的场景中展示了我们的方法的有效性。我们可在https://ut-Oustin-rpl.github.io/HouseuseDitto/视觉观测中找到代码和其他结果。我们可在https://ut-Oustin-rpl.github.io/HouseDittto/