We propose a scalable, efficient and accurate approach to retrieve 3D models for objects in the wild. Our contribution is twofold. We first present a 3D pose estimation approach for object categories which significantly outperforms the state-of-the-art on Pascal3D+. Second, we use the estimated pose as a prior to retrieve 3D models which accurately represent the geometry of objects in RGB images. For this purpose, we render depth images from 3D models under our predicted pose and match learned image descriptors of RGB images against those of rendered depth images using a CNN-based multi-view metric learning approach. In this way, we are the first to report quantitative results for 3D model retrieval on Pascal3D+, where our method chooses the same models as human annotators for 50% of the validation images on average. In addition, we show that our method, which was trained purely on Pascal3D+, retrieves rich and accurate 3D models from ShapeNet given RGB images of objects in the wild.
翻译:我们提出一个可扩展、高效和准确的方法来检索野生物体的 3D 模型。 我们的贡献是双重的。 我们首先提出一个 3D 显示对象类别的估计方法, 大大优于Pascal3D+上的最新艺术。 其次, 我们使用估计的3D 模型, 在检索3D 模型之前, 准确代表 RGB 图像中天体的几何。 为此, 我们用基于CNN 的多视图计量学习方法, 将3D 图像的深度图象描述器与已变深的图像相匹配。 这样, 我们第一个报告在 Pscal3D+上3D 模型检索的量化结果, 我们的方法选择了50%的校验图像为人类说明器。 此外, 我们展示了我们的方法, 我们的方法, 纯粹在 Pscal13D+ 上培训, 从 ShapeNet 获取丰富和准确的 3D 3D 模型, 因为有 RGB 野生物体的图像。