Understanding and manipulating deformable objects (e.g., ropes and fabrics) is an essential yet challenging task with broad applications. Difficulties come from complex states and dynamics, diverse configurations and high-dimensional action space of deformable objects. Besides, the manipulation tasks usually require multiple steps to accomplish, and greedy policies may easily lead to local optimal states. Existing studies usually tackle this problem using reinforcement learning or imitating expert demonstrations, with limitations in modeling complex states or requiring hand-crafted expert policies. In this paper, we study deformable object manipulation using dense visual affordance, with generalization towards diverse states, and propose a novel kind of foresightful dense affordance, which avoids local optima by estimating states' values for long-term manipulation. We propose a framework for learning this representation, with novel designs such as multi-stage stable learning and efficient self-supervised data collection without experts. Experiments demonstrate the superiority of our proposed foresightful dense affordance. Project page: https://hyperplane-lab.github.io/DeformableAffordance
翻译:理解和操纵变形物体(例如绳索和织物)是一项非常关键但也充满挑战性的任务,并具有广泛的应用。困难来自于变形物体的复杂状态和动态特性、多样的配置以及高维动作空间。此外,操作任务通常需要多个步骤才能完成,贪心策略很容易导致局部最优状态。现有的研究通常使用强化学习或模仿专家演示来解决这个问题,但是在建模复杂状态或要求手工专家策略方面存在局限性。在本文中,我们使用密集的视觉能力来研究可变形物体的操作,并具备对不同状态的推广能力。我们提出了一种新型的有远见的密集视觉能力,通过估计状态的价值进行长期操作,从而避免了局部最优状态。我们提出了一种学习这种表示的框架,具有新颖的设计,例如多阶段稳定学习以及在没有专家的情况下进行有效的自我监督数据收集。实验证明了我们提出的有远见的密集视觉能力的优越性。项目页面:https:/ /hyperplane-lab.github.io/DeformableAffordance