In this paper, we tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images. This task contrasts with the one considered by most existing deep learning methods which typically assume that the testing objects have been observed during training. To handle the unseen objects, we follow a retrieval-based strategy and prevent the network from learning object-specific features by computing multi-scale local similarities between the query image and synthetically-generated reference images. We then introduce an adaptive fusion module that robustly aggregates the local similarities into a global similarity score of pairwise images. Furthermore, we speed up the retrieval process by developing a fast retrieval strategy. Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works. Our code and pre-trained models are available at https://sailor-z.github.io/projects/Unseen_Object_Pose.html.
翻译:在本文中, 我们处理从单镜图像中估计先前未见物体的 3D 方向的任务 。 这项任务与大多数现有深层学习方法所考虑的任务形成对比, 这些方法通常假定测试对象在训练期间已经观察到。 为了处理看不见物体, 我们遵循基于检索的战略, 通过计算查询图像和合成生成参考图像之间多尺度的本地相似点, 防止网络学习特定物体特征。 然后我们引入一个适应性融合模块, 将本地相似点强有力地汇总到双向图像的全球相似度分中。 此外, 我们通过制定快速检索战略加快了检索进程。 我们在LineMOD、 LineMOD- Occclomete和 T- LESS 数据集的实验显示, 我们的方法比以往的作品对未见物体的概括性要好得多。 我们的代码和预先训练模型可在 https:// sailor-z.github. io/ project/ Unseen_ Object_ Pose.html 上查阅 。