Keypoint detection and description play a central role in computer vision. Most existing methods are in the form of scene-level prediction, without returning the object classes of different keypoints. In this paper, we propose the object-centric formulation, which, beyond the conventional setting, requires further identifying which object each interest point belongs to. With such fine-grained information, our framework enables more downstream potentials, such as object-level matching and pose estimation in a clustered environment. To get around the difficulty of label collection in the real world, we develop a sim2real contrastive learning mechanism that can generalize the model trained in simulation to real-world applications. The novelties of our training method are three-fold: (i) we integrate the uncertainty into the learning framework to improve feature description of hard cases, e.g., less-textured or symmetric patches; (ii) we decouple the object descriptor into two output branches -- intra-object salience and inter-object distinctness, resulting in a better pixel-wise description; (iii) we enforce cross-view semantic consistency for enhanced robustness in representation learning. Comprehensive experiments on image matching and 6D pose estimation verify the encouraging generalization ability of our method from simulation to reality. Particularly for 6D pose estimation, our method significantly outperforms typical unsupervised/sim2real methods, achieving a closer gap with the fully supervised counterpart. Additional results and videos can be found at https://zhongcl-thu.github.io/rock/
翻译:关键点的检测和描述在计算机视野中起着核心作用。 多数现有方法的形式是场景级的预测, 但不返回不同关键点的对象类别。 在本文中, 我们提议了以对象为中心的配方, 这要求除常规设置外, 进一步确定每个利益点属于哪个对象。 有了这些细微的信息, 我们的框架可以让更多的下游潜力, 如目标级匹配和在组合环境中进行估计。 为了绕过真实世界中标签收集的难度, 我们开发了一个模拟层面的模拟对比学习机制, 可以将经过模拟的模型推广到真实世界应用中。 我们的培训方法的新颖性是三重:(一) 我们把不确定性纳入学习框架, 以改进硬体案例的特征描述, 例如, 淡化或对称补补补补;(二) 我们将对象描述标注分为两个输出分支 -- 内部的突出度和洞穴间差异, 从而产生更好的比喻描述;(三) 我们将跨视角的定式图像一致性提高到真实世界应用程序应用。 全面学习我们更精确的模拟方法, 将我们更精确的校正的校正性估算方法, 进行更精确的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正。