Manipulating volumetric deformable objects in the real world, like plush toys and pizza dough, bring substantial challenges due to infinite shape variations, non-rigid motions, and partial observability. We introduce ACID, an action-conditional visual dynamics model for volumetric deformable objects based on structured implicit neural representations. ACID integrates two new techniques: implicit representations for action-conditional dynamics and geodesics-based contrastive learning. To represent deformable dynamics from partial RGB-D observations, we learn implicit representations of occupancy and flow-based forward dynamics. To accurately identify state change under large non-rigid deformations, we learn a correspondence embedding field through a novel geodesics-based contrastive loss. To evaluate our approach, we develop a simulation framework for manipulating complex deformable shapes in realistic scenes and a benchmark containing over 17,000 action trajectories with six types of plush toys and 78 variants. Our model achieves the best performance in geometry, correspondence, and dynamics predictions over existing approaches. The ACID dynamics models are successfully employed to goal-conditioned deformable manipulation tasks, resulting in a 30% increase in task success rate over the strongest baseline. For more results and information, please visit https://b0ku1.github.io/acid/ .
翻译:在现实世界中,操纵体积变形的物体,如肥玩具和比萨面团,由于无限形状变异、非硬性动作和部分可观察性,带来了巨大的挑战。我们引入了ACID,这是一个基于结构化的内隐神经表层的体积变形物体的具有行动条件的可视动态模型。ACID结合了两种新的技术:行动-有条件动态的隐含表示和基于大地测量的对比性学习。为了代表部分RGB-D观测的可变动动态,我们学习了占用和流动前方动态的隐含表现。为了准确地确定在大规模非硬性畸形下的国家变化,我们通过基于新型大地特征的对比性损失学习了对应嵌入场。为了评估我们的方法,我们开发了一个模拟框架,在现实场景中操纵复杂的变形形状,以及包含17 000多个行动轨迹和六种附加玩具和78种变异体的基准。我们的模型在地理测量、通信和动态预测方面达到最强的性能表现。为了精确地测量、通信和现有方法,我们成功地利用了对等动态模型模型模型,我们成功地运用了一种对应性模型模型,我们成功地应用到一个对应的折换模型,成功地应用到一个对应性模型,为了在目标-更精确的模型,在30调制的模型上调制的模型,结果上,从而取取取取取取取取取取取取取取取取取取取取取取的30的30/取结果结果结果结果的结果。