Inferring the stereo structure of objects in the real world is a challenging yet practical task. To equip deep models with this ability usually requires abundant 3D supervision which is hard to acquire. It is promising that we can simply benefit from synthetic data, where pairwise ground-truth is easy to access. Nevertheless, the domain gaps are nontrivial considering the variant texture, shape and context. To overcome these difficulties, we propose a Visio-Perceptual Adaptive Network for single-view 3D reconstruction, dubbed VPAN. To generalize the model towards a real scenario, we propose to fulfill several aspects: (1) Look: visually incorporate spatial structure from the single view to enhance the expressiveness of representation; (2) Cast: perceptually align the 2D image features to the 3D shape priors with cross-modal semantic contrastive mapping; (3) Mold: reconstruct stereo-shape of target by transforming embeddings into the desired manifold. Extensive experiments on several benchmarks demonstrate the effectiveness and robustness of the proposed method in learning the 3D shape manifold from synthetic data via a single-view. The proposed method outperforms state-of-the-arts on Pix3D dataset with IoU 0.292 and CD 0.108, and reaches IoU 0.329 and CD 0.104 on Pascal 3D+.
翻译:在现实世界中,对物体的立体结构进行推论是一项具有挑战性但实际的任务。要给深层次模型配备这种能力通常需要大量3D监督,而这种能力很难获得。令人充满希望的是,我们可以简单地从合成数据中受益,因为对称地面真相很容易获得。然而,考虑到变体的纹理、形状和背景,域间差距并非三维,为了克服这些困难,我们提议为单一视图的3D重建建立一个维西奥-观念适应网络,称为VPAN。为了将模型推广到真实情景,我们提议实现几个方面:(1) 外观:从单一视图中视觉地结合空间结构,以提高代表性的清晰度;(2) 外观:将2D图像特征与三维形状之前的立体图相匹配,同时进行跨模式的语系对比绘图;(3) Mold:通过将嵌入理想的三维维立体重建立体的立体成形图,在几个基准上进行广泛的实验,表明拟议方法在通过单一视图从合成数据中学习3D制成形体时的有效性和坚固,我们提议的方法在1个P-295和0.1 CD-0.1和0.1的立面的CD-10-10-10-10-10-10-10-10-SAL-3/CD-M-3-3/CD-0.1-0.1-0.1-0.1-0.1-0.1-0.1-0.1-0.13-10-SA-3-SAG-MS-MS-CD-MS-SA-SA-SA-3-3-3-3-3-SA-3-MS-SA-3-SA-SA-SA-3-SAD-MS-MS-SA-SA-SA-SA-3-SA-3-SA-3-SA-3-SA-SA-3-SA-3-SA-3-SA-3-SA-3-SA-3-SA-3-CD-CD-SA-3-SA-3-SA-3-SA-3-SA-3-SA-3-SA-3-SA-3-CD-SA-3-CD-CD-SA-3-SA-3-SA-3-SA-3-SA-3-SA-3-3-3-3-3-3-CD-3-SA-3-CD-CD-CD-SA-3-SA-3-SA-3-SA-3-SA-3-SA-3-3-SA-3-SA-3-3-3-3-3-3-CD-CD-SA-3-SA-3-3-