Motivated by the need of estimating the pose (viewpoint) of arbitrary objects in the wild, which is only covered by scarce and small datasets, we consider the challenging problem of class-agnostic 3D object pose estimation, with no 3D shape knowledge. The idea is to leverage features learned on seen classes to estimate the pose for classes that are unseen, yet that share similar geometries and canonical frames with seen classes. For this, we train a direct pose estimator in a class-agnostic way by sharing weights across all object classes, and we introduce a contrastive learning method that has three main ingredients: (i) the use of pre-trained, self-supervised, contrast-based features; (ii) pose-aware data augmentations; (iii) a pose-aware contrastive loss. We experimented on Pascal3D+ and ObjectNet3D, as well as Pix3D in a cross-dataset fashion, with both seen and unseen classes. We report state-of-the-art results, including against methods that use additional shape information, and also when we use detected bounding boxes.
翻译:由于需要估计野生任意物体的姿势(景点),而野生任意物体的姿势(景点)只是由稀少和小的数据集所覆盖,因此,我们考虑到类知性三维天体的估算具有挑战性,没有3D形状知识,其想法是利用在可见类中学习到的特征来估计隐形类的姿势,但具有相似的地貌和与被见类相近的金字框。为此,我们通过在所有对象类别中分享重量,以类知性方式培训一个直接的姿势定点。我们采用了一种对比性学习方法,它有三个主要成份:(一) 使用预先训练的、自我监督的、以对比为基础的特征;(二) 面识性数据增强;(三) 面识性反差损失。我们在Pscal3D+ 和对象Net3D 以及 Pix3D 以交叉数据集方式进行实验,既有可见的,也有看不见的。我们报告最新的结果,包括针对使用其他形状信息的方法,还有在我们探测的封闭箱时。