Perceiving and manipulating 3D articulated objects (e.g., cabinets, doors) in human environments is an important yet challenging task for future home-assistant robots. The space of 3D articulated objects is exceptionally rich in their myriad semantic categories, diverse shape geometry, and complicated part functionality. Previous works mostly abstract kinematic structure with estimated joint parameters and part poses as the visual representations for manipulating 3D articulated objects. In this paper, we propose object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation, by predicting dense geometry-aware, interaction-aware, and task-aware visual action affordance and trajectory proposals. We design an interaction-for-perception framework VAT-Mart to learn such actionable visual representations by simultaneously training a curiosity-driven reinforcement learning policy exploring diverse interaction trajectories and a perception module summarizing and generalizing the explored knowledge for pointwise predictions among diverse shapes. Experiments prove the effectiveness of the proposed approach using the large-scale PartNet-Mobility dataset in SAPIEN environment and show promising generalization capabilities to novel test shapes, unseen object categories, and real-world data. Project page: https://hyperplane-lab.github.io/vat-mart
翻译:3D 表达的物体的空间非常丰富, 包括各种语义分类、 不同形状的几何和复杂的部分功能。 先前的工作大多是抽象的动态结构, 包括估计的联合参数和部分, 作为3D 表达的物体的视觉表达方式。 在本文件中, 我们提出以对象为中心的可操作性前视预言, 是一个新颖的感知- 互动握手点, 即感知系统产生比动态结构估计更可操作的指导, 其方法是预测密度高的几何觉知度、 互动觉察力、 任务觉察力的视觉动作, 以及轨迹建议等。 我们设计了一个交互感知框架 VAT- Mart, 通过同时训练由好奇心驱动的强化学习政策, 探索多种互动轨迹, 以及一个感知模块, 总结和概括所探索的知识, 用于不同形状的点度预测。 实验证明了使用大型的 半位化的 部分网络 、 互动觉觉察力 和有希望的图像 模型 显示普通的图像环境: SAP- IMVED 数据能力 。