We explore learning pixelwise correspondences between images of deformable objects in different configurations. Traditional correspondence matching approaches such as SIFT, SURF, and ORB can fail to provide sufficient contextual information for fine-grained manipulation. We propose Multi-Modal Gaussian Shape Descriptor (MMGSD), a new visual representation of deformable objects which extends ideas from dense object descriptors to predict all symmetric correspondences between different object configurations. MMGSD is learned in a self-supervised manner from synthetic data and produces correspondence heatmaps with measurable uncertainty. In simulation, experiments suggest that MMGSD can achieve an RMSE of 32.4 and 31.3 for square cloth and braided synthetic nylon rope respectively. The results demonstrate an average of 47.7% improvement over a provided baseline based on contrastive learning, symmetric pixel-wise contrastive loss (SPCL), as opposed to MMGSD which enforces distributional continuity.
翻译:我们探索不同配置的变形物体图像之间的像素对应。 传统的对应方法,如SIFT、 SFRF和ORB等传统对应方法可能无法提供足够的背景信息进行精细操作。 我们提议多式高斯成形描述器(MMGSD),这是变形物体的新视觉表达方式,它扩展了密度物体描述器的想法,以预测不同物体配置之间的所有对称对应对应对应对应。 MMGSD是从合成数据中以自我监督的方式学习的,并产生具有可测量不确定性的对应热谱。 在模拟中,实验表明MMGSD可以分别为平面布和条状合成尼龙绳取得32.4和31.3的RMSE。 结果表明,比基于对比性学习、对称像素的对比性对比性对比性损失(SPCL)提供的基线平均改善47.7%, 而MGSD则能加强分布连续性。