Recently, the transformer model has been successfully employed for the multi-view 3D reconstruction problem. However, challenges remain on designing an attention mechanism to explore the multiview features and exploit their relations for reinforcing the encoding-decoding modules. This paper proposes a new model, namely 3D coarse-to-fine transformer (3D-C2FT), by introducing a novel coarse-to-fine(C2F) attention mechanism for encoding multi-view features and rectifying defective 3D objects. C2F attention mechanism enables the model to learn multi-view information flow and synthesize 3D surface correction in a coarse to fine-grained manner. The proposed model is evaluated by ShapeNet and Multi-view Real-life datasets. Experimental results show that 3D-C2FT achieves notable results and outperforms several competing models on these datasets.
翻译:最近,变压器模型成功地用于多视图 3D 重建问题,然而,在设计一个关注机制以探索多视图特征并利用它们的关系加强编码-解码模块方面仍然存在挑战。本文件提出了一个新的模式,即3D 粗眼到纤维变压器(3D-C2FT),为此引入了一个新颖的粗眼到纤维(C2F)关注机制,用于编码多视图特征和纠正缺陷的3D对象。C2F 关注机制使该模式能够学习多视图信息流,并以粗略到精细的方式合成三维表面校正。ShapeNet和多视图现实-生活数据集对拟议的模型进行了评估。实验结果表明,3D-C2FT取得了显著的成果,并超越了这些数据集上的若干相互竞争的模式。