It is essential yet challenging for future home-assistant robots to understand and manipulate diverse 3D objects in daily human environments. Towards building scalable systems that can perform diverse manipulation tasks over various 3D shapes, recent works have advocated and demonstrated promising results learning visual actionable affordance, which labels every point over the input 3D geometry with an action likelihood of accomplishing the downstream task (e.g., pushing or picking-up). However, these works only studied single-gripper manipulation tasks, yet many real-world tasks require two hands to achieve collaboratively. In this work, we propose a novel learning framework, DualAfford, to learn collaborative affordance for dual-gripper manipulation tasks. The core design of the approach is to reduce the quadratic problem for two grippers into two disentangled yet interconnected subtasks for efficient learning. Using the large-scale PartNet-Mobility and ShapeNet datasets, we set up four benchmark tasks for dual-gripper manipulation. Experiments prove the effectiveness and superiority of our method over three baselines.
翻译:未来家庭服务机器人理解和操纵各式各样三维物体是非常重要且具有挑战性的事情。为了构建具有可扩展性的系统以执行各种三维形状上的不同操纵任务,最近的研究倡导并展示了学习视觉可操作性的有希望的结果,它用操作可能性标记输入三维几何体中的每个点来实现下游任务(例如,推或拿起)。然而,这些工作只研究了单抓手操作任务,而许多真实世界的任务需要使用两只手进行协作。在这项研究中,我们提出了一个新的学习框架DualAfford,用于双手夹取操纵任务的协作可操作性学习。该方法的核心设计是将两个钳位的二次问题降低到两个解耦却相互关联的子任务中,以便有效的学习。使用大规模的PartNet-Mobility和ShapeNet数据集,我们为双手夹取操纵设置了四个基准任务。实验证明了我们的方法的有效性和优越性,比三个基线更具优势。