We introduce MegaPose, a method to estimate the 6D pose of novel objects, that is, objects unseen during training. At inference time, the method only assumes knowledge of (i) a region of interest displaying the object in the image and (ii) a CAD model of the observed object. The contributions of this work are threefold. First, we present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects. The shape and coordinate system of the novel object are provided as inputs to the network by rendering multiple synthetic views of the object's CAD model. Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner. Third, we introduce a large-scale synthetic dataset of photorealistic images of thousands of objects with diverse visual and shape properties and show that this diversity is crucial to obtain good generalization performance on novel objects. We train our approach on this large synthetic dataset and apply it without retraining to hundreds of novel objects in real images from several pose estimation benchmarks. Our approach achieves state-of-the-art performance on the ModelNet and YCB-Video datasets. An extensive evaluation on the 7 core datasets of the BOP challenge demonstrates that our approach achieves performance competitive with existing approaches that require access to the target objects during training. Code, dataset and trained models are available on the project page: https://megapose6d.github.io/.
翻译:我们引入了MegaPose, 这是用来估计6D新对象构成的一种方法, 即培训期间看不见的物体。 在推断时间里, 该方法只假设了解 (一) 一个在图像中显示对象感兴趣的区域, 以及 (二) 一个被观测对象的 CAD 模型。 这项工作的贡献是三重的。 首先, 我们根据一个可以应用到新对象的成像和成像战略, 提出了一个6D 配置精细的精细图案。 新对象的形状和协调系统作为输入到网络中, 方法是提供该对象的 CAD 模型的多重合成视图。 其次, 我们引入了一种新型组合图案估计方法, 利用经过培训的网络组合图案, 来对合成图案和同一对象的观察到的图像之间的差错进行分类。 第三, 我们引入了一套大型的合成图案数据集, 展示了我们现有的7OVSery 数据库, 展示了我们现有的B 模型 数据库 和 数据库 数据库 的模型 评估方法。