This paper tackles the task of goal-conditioned dynamic manipulation of deformable objects. This task is highly challenging due to its complex dynamics (introduced by object deformation and high-speed action) and strict task requirements (defined by a precise goal specification). To address these challenges, we present Iterative Residual Policy (IRP), a general learning framework applicable to repeatable tasks with complex dynamics. IRP learns an implicit policy via delta dynamics -- instead of modeling the entire dynamical system and inferring actions from that model, IRP learns delta dynamics that predict the effects of delta action on the previously-observed trajectory. When combined with adaptive action sampling, the system can quickly optimize its actions online to reach a specified goal. We demonstrate the effectiveness of IRP on two tasks: whipping a rope to hit a target point and swinging a cloth to reach a target pose. Despite being trained only in simulation on a fixed robot setup, IRP is able to efficiently generalize to noisy real-world dynamics, new objects with unseen physical properties, and even different robot hardware embodiments, demonstrating its excellent generalization capability relative to alternative approaches. Video is available at https://youtu.be/7h3SZ3La-oA
翻译:本文处理对变形物体进行有目标条件的动态操纵的任务。 任务具有高度挑战性, 因为它具有复杂的动态( 由物体变形和高速动作带来) 和严格的任务要求( 由精确的目标规格界定 ) 。 为了应对这些挑战, 我们提出迭代剩余政策( IRP ), 这是适用于复杂动态重复任务的一般学习框架 。 IRP 通过三角动力学学习隐含的政策 -- -- 而不是模拟整个动态系统和从该模型中推断行动, IRP 学会了预测三角洲行动对先前观测到的轨道的影响的三角洲动态。 当与适应性行动取样相结合时, 系统可以快速优化其在线行动, 以达到一个特定目标 。 我们展示了 IMP 在两项任务上的有效性 : 鞭打一根绳子以达到目标点, 和摆动布以达到目标姿势。 尽管只接受过固定机器人设置的模拟培训, IRPP能够有效地概括到噪音的真实世界动态、 具有看不见物理特性的新物体, 甚至不同的机器人硬件化, 展示其优异的普通化能力 ALA/ A/ 3 。