3D reconstruction aims to reconstruct 3D objects from 2D views. Previous works for 3D reconstruction mainly focus on feature matching between views or using CNNs as backbones. Recently, Transformers have been shown effective in multiple applications of computer vision. However, whether or not Transformers can be used for 3D reconstruction is still unclear. In this paper, we fill this gap by proposing 3D-RETR, which is able to perform end-to-end 3D REconstruction with TRansformers. 3D-RETR first uses a pretrained Transformer to extract visual features from 2D input images. 3D-RETR then uses another Transformer Decoder to obtain the voxel features. A CNN Decoder then takes as input the voxel features to obtain the reconstructed objects. 3D-RETR is capable of 3D reconstruction from a single view or multiple views. Experimental results on two datasets show that 3DRETR reaches state-of-the-art performance on 3D reconstruction. Additional ablation study also demonstrates that 3D-DETR benefits from using Transformers.
翻译:3D 重建旨在从 2D 视图中重建 3D 对象 。 先前的 3D 重建工程主要侧重于观点之间的匹配功能或使用CNN 作为主干线 。 最近, 变换器在计算机视觉的多种应用中显示有效 。 然而, 3D 重建是否可以使用变换器 仍然不清楚 。 在本文件中, 我们提出 3D- RETR 来填补这一空白, 3D- RETR 能够与TRansex 进行端到端 3D 重建。 3D- RETR 首先使用预先训练的变换器从 2D 输入图像中提取视觉功能 。 3D- RETR 然后使用另一个变换器解开器获取 voxel 特性 。 CNN 解译器然后将 voxel 特性作为输入输入 。 3D- RET 能够从单一视图或多个视图进行三维重建。 两个数据集的实验结果表明, 3DRET 在 3D 重建中达到最新性表现。 。 。 。 额外的通货膨胀研究还显示 3D- DETR 还显示 3DTR 3D- DETR 。