It is essential yet challenging for future home-assistant robots to understand and manipulate diverse 3D objects in daily human environments. Towards building scalable systems that can perform diverse manipulation tasks over various 3D shapes, recent works have advocated and demonstrated promising results learning visual actionable affordance, which labels every point over the input 3D geometry with an action likelihood of accomplishing the downstream task (e.g., pushing or picking-up). However, these works only studied single-gripper manipulation tasks, yet many real-world tasks require two hands to achieve collaboratively. In this work, we propose a novel learning framework, DualAfford, to learn collaborative affordance for dual-gripper manipulation tasks. The core design of the approach is to reduce the quadratic problem for two grippers into two disentangled yet interconnected subtasks for efficient learning. Using the large-scale PartNet-Mobility and ShapeNet datasets, we set up four benchmark tasks for dual-gripper manipulation. Experiments prove the effectiveness and superiority of our method over three baselines.
翻译:在未来家庭助手机器人理解和操作不同的三维物体至关重要,但也很具有挑战性。为了构建可伸缩的系统,能够针对不同的三维形状执行各种不同的操作任务,近期的研究主张并证明了学习视觉可操作性的可行性,这可以为每个点在输入的三维几何中设置行动可能性,以完成后续任务(例如推或拿起)。然而,这些作品仅研究了单夹具操作任务,而许多真实世界的任务需要两只手协作完成。在这项工作中,我们提出了一种新的学习框架,DualAfford,用于学习协作视觉可操作性,以完成双夹具操作任务。该方法的核心设计是将两个夹具的二次问题简化为两个解耦的但相互关联的子任务,以便于有效的学习。使用大规模的PartNet-Mobility和ShapeNet数据集,我们为双夹具操作设置了四个基准任务。实验证明了我们的方法在三个基准任务上优于三个基线方法的有效性和优越性。