Acquiring labeled 6D poses from real images is an expensive and time-consuming task. Though massive amounts of synthetic RGB images are easy to obtain, the models trained on them suffer from noticeable performance degradation due to the synthetic-to-real domain gap. To mitigate this degradation, we propose a practical self-supervised domain adaptation approach that takes advantage of real RGB(-D) data without needing real pose labels. We first pre-train the model with synthetic RGB images and then utilize real RGB(-D) images to fine-tune the pre-trained model. The fine-tuning process is self-supervised by the RGB-based pose-aware consistency and the depth-guided object distance pseudo-label, which does not require the time-consuming online differentiable rendering. We build our domain adaptation method based on the recent pose estimator SC6D and evaluate it on the YCB-Video dataset. We experimentally demonstrate that our method achieves comparable performance against its fully-supervised counterpart while outperforming existing state-of-the-art approaches.
翻译:尽管大量合成RGB图像很容易获得,但经过培训的模型由于合成到现实的域间差距而出现明显的性能退化。为缓解这种退化,我们提议一种实用的自监督域适应方法,利用真实的 RGB(-D) 数据而无需真实的表面标签。我们首先用合成RGB 图像对模型进行预演,然后利用真实的 RGB(-D) 图像微调经过培训的模型。微调过程由基于 RGB 的外观一致性和深度引导对象距离的假标签自我监督,不需要花费时间的在线可变图解。我们根据最近的配置估计器SC6D 建立我们的域适应方法,并在YCB-Video数据集上进行评估。我们实验性地证明,我们的方法与完全超前的对应方相比,其业绩可以与完全超前的模型相比。