Understanding and manipulating deformable objects (e.g., ropes and fabrics) is an essential yet challenging task with broad applications. Difficulties come from complex states and dynamics, diverse configurations and high-dimensional action space of deformable objects. Besides, the manipulation tasks usually require multiple steps to accomplish, and greedy policies may easily lead to local optimal states. Existing studies usually tackle this problem using reinforcement learning or imitating expert demonstrations, with limitations in modeling complex states or requiring hand-crafted expert policies. In this paper, we study deformable object manipulation using dense visual affordance, with generalization towards diverse states, and propose a novel kind of foresightful dense affordance, which avoids local optima by estimating states' values for long-term manipulation. We propose a framework for learning this representation, with novel designs such as multi-stage stable learning and efficient self-supervised data collection without experts. Experiments demonstrate the superiority of our proposed foresightful dense affordance. Project page: https://hyperplane-lab.github.io/DeformableAffordance
翻译:理解和操作可变形物体(例如绳索和织物)是一项基本但具有广泛应用的任务,困难来自于可变形物体的复杂状态和动态、不同的配置和高维的行动空间。此外,操作任务通常需要多个步骤才能完成,贪婪策略容易导致局部最优状态。现有研究通常使用强化学习或模仿专家演示来处理这个问题,但在建模复杂状态或需要手工制作专家策略方面存在局限性。在本文中,我们研究了利用密集视觉能力对可变形物体进行操作,具有对不同状态的泛化能力,并提出了一种新的具有前瞻性的密集能力,通过估计长期操纵状态的价值来避免局部最优状态。我们提出了一个学习这种表示的框架,具有多阶段稳定学习和无专家高效自监督数据收集等新颖设计。实验证明了我们提出的具有前瞻性的密集能力的优越性。项目页面:https://hyperplane-lab.github.io/DeformableAffordance