Recently, learning-based approaches for 3D model reconstruction have attracted attention owing to its modern applications such as Extended Reality(XR), robotics and self-driving cars. Several approaches presented good performance on reconstructing 3D shapes by learning solely from images, i.e., without using 3D models in training. Challenges, however, remain in texture generation due to the gap between 2D and 3D modals. In previous work, the grid sampling mechanism from Spatial Transformer Networks was adopted to sample color from an input image to formulate texture. Despite its success, the existing framework has limitations on searching scope in sampling, resulting in flaws in generated texture and consequentially on rendered 3D models. In this paper, to solve that issue, we present a novel sampling algorithm by optimizing the gradient of predicted coordinates based on the variance on the sampling image. Taking into account the semantics of the image, we adopt Frechet Inception Distance (FID) to form a loss function in learning, which helps bridging the gap between rendered images and input images. As a result, we greatly improve generated texture. Furthermore, to optimize 3D shape reconstruction and to accelerate convergence at training, we adopt part segmentation and template learning in our model. Without any 3D supervision in learning, and with only a collection of single-view 2D images, the shape and texture learned by our model outperform those from previous work. We demonstrate the performance with experimental results on a publically available dataset.
翻译:最近,3D模型重建的基于学习的方法因其现代应用而引起人们的关注,例如扩展现实(XR)、机器人和自驾驶汽车等现代应用。一些方法展示了通过仅仅从图像中学习(即没有在培训中使用3D模型)来重建3D形状的良好表现。然而,由于2D和3D模型之间的差距,在质谱生成方面仍然存在挑战。在以往的工作中,空间变换器网络的电网取样机制被采纳为从输入图像中抽取颜色以形成纹理。尽管它取得了成功,但现有框架在抽样搜索范围方面有局限性,导致生成的纹理的缺陷,从而随之形成3D模型。为了解决这一问题,我们提出了一种新型的取样算法,根据取样图像的差异优化了预测坐标的梯度。考虑到图像的语义特征,我们采用了Frechet Inception Contion距离(FID)来形成一种学习模式,从而帮助缩小所提供的图像和输入图像之间的差距。我们从中极大地改进了生成的文本的范围,从而导致3D模型的缺陷,从而解决了3D模型的缺陷。此外,我们提出了一种全新的3D结构结构,并在单一的学习过程中学习了我们学习了任何单一的模板。