Specifying tasks with videos is a powerful technique towards acquiring novel and general robot skills. However, reasoning over mechanics and dexterous interactions can make it challenging to scale learning contact-rich manipulation. In this work, we focus on the problem of visual non-prehensile planar manipulation: given a video of an object in planar motion, find contact-aware robot actions that reproduce the same object motion. We propose a novel architecture, Differentiable Learning for Manipulation (\ours), that combines video decoding neural models with priors from contact mechanics by leveraging differentiable optimization and finite difference based simulation. Through extensive simulated experiments, we investigate the interplay between traditional model-based techniques and modern deep learning approaches. We find that our modular and fully differentiable architecture performs better than learning-only methods on unseen objects and motions. \url{https://github.com/baceituno/dlm}.
翻译:使用视频指定任务是获得新颖和一般机器人技能的有力技术。 但是,对机械学和超速互动的推理会给缩小学习接触率的操纵规模带来挑战。 在这项工作中,我们集中关注视觉上非无孔不入的平板操作问题:给一个在平板运动中的物体的视频,找到复制同一物体动作的接触感知机器人动作。我们提出了一个新颖的结构,即不同操作学习(\ours),它将视频解码神经模型与接触机械学的前身相结合,利用不同优化和基于有限差异的模拟。我们通过广泛的模拟实验,调查传统模型技术与现代深层次学习方法之间的相互作用。我们发现,我们的模块和完全不同的结构比在看不见物体和动作上只学习的方法要好。\ url{https://github.com/baceitonno/dlm}。