Motivated by the need for estimating the 3D pose of arbitrary objects, we consider the challenging problem of class-agnostic object viewpoint estimation from images only, without CAD model knowledge. The idea is to leverage features learned on seen classes to estimate the pose for classes that are unseen, yet that share similar geometries and canonical frames with seen classes. We train a direct pose estimator in a class-agnostic way by sharing weights across all object classes, and we introduce a contrastive learning method that has three main ingredients: (i) the use of pre-trained, self-supervised, contrast-based features; (ii) pose-aware data augmentations; (iii) a pose-aware contrastive loss. We experimented on Pascal3D+, ObjectNet3D and Pix3D in a cross-dataset fashion, with both seen and unseen classes. We report state-of-the-art results, including against methods that additionally use CAD models as input.
翻译:基于对任意物体的3D构成进行估计的需要,我们考虑在没有 CAD 模型知识的情况下,仅从图像中进行等级不可知天体观点估计这一具有挑战性的问题。想法是利用在视觉类中学习到的特征来估计隐形类的构成,但与视觉类具有相似的地貌和孔形框。我们通过在所有对象类中分享重量来以等级不可知方式训练一个直接成形的估测器。我们采用了一种对比性学习方法,它有三个主要成份:(一) 使用预先训练的、自我监督的、以对比为基础的特征;(二) 显示数据增强功能;(三) 表面觉悟的对比性损失。我们在Pascal3D+、OcalNet3D和Pix3D上以交叉数据组合方式进行了实验,既有可见的,也有看不见的类别。我们报告了最新的结果,包括针对额外使用 CAD 模型作为投入的方法。