Manipulating volumetric deformable objects in the real world, like plush toys and pizza dough, bring substantial challenges due to infinite shape variations, non-rigid motions, and partial observability. We introduce ACID, an action-conditional visual dynamics model for volumetric deformable objects based on structured implicit neural representations. ACID integrates two new techniques: implicit representations for action-conditional dynamics and geodesics-based contrastive learning. To represent deformable dynamics from partial RGB-D observations, we learn implicit representations of occupancy and flow-based forward dynamics. To accurately identify state change under large non-rigid deformations, we learn a correspondence embedding field through a novel geodesics-based contrastive loss. To evaluate our approach, we develop a simulation framework for manipulating complex deformable shapes in realistic scenes and a benchmark containing over 17,000 action trajectories with six types of plush toys and 78 variants. Our model achieves the best performance in geometry, correspondence, and dynamics predictions over existing approaches. The ACID dynamics models are successfully employed to goal-conditioned deformable manipulation tasks, resulting in a 30% increase in task success rate over the strongest baseline. Furthermore, we apply the simulation-trained ACID model directly to real-world objects and show success in manipulating them into target configurations. For more results and information, please visit https://b0ku1.github.io/acid/ .
翻译:在现实世界中,操纵体积变形物体,如外观玩具和比萨面团,由于无限形状变异、非硬性动作和部分可观察性,带来巨大的挑战。我们引入了ACID,这是一个基于结构化内隐性神经表层的体积变形物体的行动条件视觉动态模型。ACID结合了两种新技术:行动-条件动态的隐含表示和基于大地测量的对比性学习。为了代表部分RGB-D观测的可变动动态,我们学习了占用和流动前方动态的隐含表示。为了准确地确定在大规模非硬性变形下的国家变化、非硬性运动和部分可部分可观察性运动,我们通过基于新型大地特征的对比性损失学习了对应嵌入场。为了评估我们的方法,我们开发了一个模拟框架,在现实场景中操纵复杂的变形形状,以及包含17 000多个行动轨迹的基准,包括六种附加玩具和78种变异体。我们的模型在地理测量、通信和动态预测方面达到最强的性性表现,为了精确地度/对应和现有方法,我们所选化的动态模型,我们成功地使用了一种通信动态模型模型模型模型,我们成功地将一个对目标对象进行定位的定位定位定位,我们成功地应用了一个折叠成目标模型,我们用到一个缩成的模型,我们用到一个新的模型,在目标的模型,在30级定式的模型中,从而显示的模型,我们定位上显示了一个成功的模型将成功率中,我们定位,我们定位上标度上显示了30的模型将成功度上标度上显示了一个测试,我们定位,我们定位上显示了一个测试的模型,然后显示了30的模型,然后显示了一个模型,我们用的模型,我们用到一个模型,在现实的模型,将成功率上标度上定的模型,将成功率,在了30的模型,在了一个模型,在了一个模型,我们的校定的校定式的模型,我们用率,我们在目标,我们用率上,我们的模型将成功。