6D object pose estimation is one of the fundamental problems in computer vision and robotics research. While a lot of recent efforts have been made on generalizing pose estimation to novel object instances within the same category, namely category-level 6D pose estimation, it is still restricted in constrained environments given the limited number of annotated data. In this paper, we collect Wild6D, a new unlabeled RGBD object video dataset with diverse instances and backgrounds. We utilize this data to generalize category-level 6D object pose estimation in the wild with semi-supervised learning. We propose a new model, called Rendering for Pose estimation network RePoNet, that is jointly trained using the free ground-truths with the synthetic data, and a silhouette matching objective function on the real-world data. Without using any 3D annotations on real data, our method outperforms state-of-the-art methods on the previous dataset and our Wild6D test set (with manual annotations for evaluation) by a large margin. Project page with Wild6D data: https://oasisyang.github.io/semi-pose .
翻译:6D对象的估算是计算机视觉和机器人研究中的根本问题之一。 虽然最近为对同一类别中的新对象情况进行一般估计作出了大量努力, 即6D类别中的6D类构成估计,但由于附加说明的数据数量有限,这种估计在受限制的环境中仍然受到限制。 在本文中,我们收集了Ward6D,这是一个新的未贴标签的 RGBD 对象视频数据集,其实例和背景各异。我们利用这些数据将6D 类对象的估算与半受监督的学习一起归纳在野外。我们提出了一个新的模型,称为Pose估计网络RePoNet,该模型使用免费地面图与合成数据共同培训,以及真实世界数据上的一个目标函数相匹配。我们的方法在不使用任何3D对真实数据的说明的情况下,大大超越了先前数据集上的最新方法以及我们的Ward6D测试集(有评估的手动说明)。项目网页有Wild6D数据: https://oasisyang.githubio/simage。