It is essential yet challenging for future home-assistant robots to understand and manipulate diverse 3D objects in daily human environments. Towards building scalable systems that can perform diverse manipulation tasks over various 3D shapes, recent works have advocated and demonstrated promising results learning visual actionable affordance, which labels every point over the input 3D geometry with an action likelihood of accomplishing the downstream task (e.g., pushing or picking-up). However, these works only studied single-gripper manipulation tasks, yet many real-world tasks require two hands to achieve collaboratively. In this work, we propose a novel learning framework, DualAfford, to learn collaborative affordance for dual-gripper manipulation tasks. The core design of the approach is to reduce the quadratic problem for two grippers into two disentangled yet interconnected subtasks for efficient learning. Using the large-scale PartNet-Mobility and ShapeNet datasets, we set up four benchmark tasks for dual-gripper manipulation. Experiments prove the effectiveness and superiority of our method over three baselines.
翻译:对于未来的家庭助理机器人来说,理解和操作人类日常环境中的各种三维天体是至关重要的,但对于未来的家庭助理机器人来说,它却具有挑战性。为了建立能够在不同三维形状上执行各种操纵任务的可扩缩系统,最近的工作倡导并展示了有希望的成果,学习可视化的可操作负担能力,将输入三维几何的每一个点贴上标签,以采取行动完成下游任务(例如推动或采集)的可能性。然而,这些工作只研究单级顶级操纵任务,但许多真实世界的任务需要双手才能实现协作。在这项工作中,我们提出了一个创新的学习框架,即DualAffford,以学习双维操纵任务的协作负担能力。这一方法的核心设计是将两个抓手的四重问题缩小为两个不相交织而又相互关联的子任务,以便高效学习。我们使用大型的 PartNet-mocity 和 ShapeNet 数据集设置了四个基准任务,用于两端操纵。实验证明了我们方法在三个基线上的有效性和优越性。