Deep learning technology has made great progress in multi-view 3D reconstruction tasks. At present, most mainstream solutions establish the mapping between views and shape of an object by assembling the networks of 2D encoder and 3D decoder as the basic structure while they adopt different approaches to obtain aggregation of features from several views. Among them, the methods using attention-based fusion perform better and more stable than the others, however, they still have an obvious shortcoming -- the strong independence of each view during predicting the weights for merging leads to a lack of adaption of the global state. In this paper, we propose a global-aware attention-based fusion approach that builds the correlation between each branch and the global to provide a comprehensive foundation for weights inference. In order to enhance the ability of the network, we introduce a novel loss function to supervise the shape overall and propose a dynamic two-stage training strategy that can effectively adapt to all reconstructors with attention-based fusion. Experiments on ShapeNet verify that our method outperforms existing SOTA methods while the amount of parameters is far less than the same type of algorithm, Pix2Vox++. Furthermore, we propose a view-reduction method based on maximizing diversity and discuss the cost-performance tradeoff of our model to achieve a better performance when facing heavy input amount and limited computational cost.
翻译:深层学习技术在多视角 3D 重建任务方面取得了巨大进步。 目前,大多数主流解决方案通过将 2D 编码器和 3D 解码器作为基本结构,同时采用不同方法从若干观点中综合各种特征,从而在对象的视角和形状之间建立了映射。其中,利用基于关注的聚合方法比其他观点运行得更好、更稳定,但它们仍然有一个明显的缺陷 -- -- 在预测合并权重时,每种观点都具有很强的独立性,导致全球状态缺乏适应性。在本文件中,我们提出一种基于全球注意的聚合方法,建立每个分支和全球之间的关联性,为加权推断提供一个全面的基础。为了提高网络的能力,我们引入一种新的损失功能,以监督整体形状,并提出一个动态的两阶段培训战略,能够有效地适应基于关注的聚合力的所有重建者。关于ShapeNet的实验证实,我们的方法超越了现有的SOTA方法,而参数的数量远远低于同一类型的算法,Pix2Vxx 计算成本,我们提出了一种基于最大成本递增成本的方法。