6D pose estimation of rigid objects from a single RGB image has seen tremendous improvements recently by using deep learning to combat complex real-world variations, but a majority of methods build models on the per-object level, failing to scale to multiple objects simultaneously. In this paper, we present a novel approach for scalable 6D pose estimation, by self-supervised learning on synthetic data of multiple objects using a single autoencoder. To handle multiple objects and generalize to unseen objects, we disentangle the latent object shape and pose representations, so that the latent shape space models shape similarities, and the latent pose code is used for rotation retrieval by comparison with canonical rotations. To encourage shape space construction, we apply contrastive metric learning and enable the processing of unseen objects by referring to similar training objects. The different symmetries across objects induce inconsistent latent pose spaces, which we capture with a conditioned block producing shape-dependent pose codebooks by re-entangling shape and pose representations. We test our method on two multi-object benchmarks with real data, T-LESS and NOCS REAL275, and show it outperforms existing RGB-based methods in terms of pose estimation accuracy and generalization.
翻译:6D 代表对来自单一 RGB 图像的僵硬天体的估计最近有了巨大的改进, 利用了深层学习来应对复杂的现实世界变化, 但大多数方法都是在每个物体一级建立模型, 未能同时向多个天体扩展。 在本文中, 我们展示了一种新颖的方法, 用于可缩放的 6D 构成估计, 方法是利用单一自动编码器对多个天体的合成数据进行自我监督的学习。 要处理多个天体, 并概括到隐形天体, 我们解开潜形天体的形状和表示方式, 因此, 潜形空间模型和潜形的形状代码将形成相似的形状和隐形的形状代码用于与罐体旋转的转换检索。 为了鼓励形状空间构造, 我们应用对比性度度度的学习方法, 并能够通过引用类似的训练对象来处理看不见天体。 不同天体的对不同的天体进行不同的配比, 我们用一个有条件的块进行捕获, 产生形状依赖形状的形状的编码, 并显示形状的形状。 我们用两个多点基准基准来测试我们的方法, 用真实的数据、 T- LESSSSER275 和 显示的精确值来显示现有RGB 的精确度。