Recently, data-driven single-view reconstruction methods have shown great progress in modeling 3D dressed humans. However, such methods suffer heavily from depth ambiguities and occlusions inherent to single view inputs. In this paper, we tackle this problem by considering a small set of input views and investigate the best strategy to suitably exploit information from these views. We propose a data-driven end-to-end approach that reconstructs an implicit 3D representation of dressed humans from sparse camera views. Specifically, we introduce three key components: first a spatially consistent reconstruction that allows for arbitrary placement of the person in the input views using a perspective camera model; second an attention-based fusion layer that learns to aggregate visual information from several viewpoints; and third a mechanism that encodes local 3D patterns under the multi-view context. In the experiments, we show the proposed approach outperforms the state of the art on standard data both quantitatively and qualitatively. To demonstrate the spatially consistent reconstruction, we apply our approach to dynamic scenes. Additionally, we apply our method on real data acquired with a multi-camera platform and demonstrate our approach can obtain results comparable to multi-view stereo with dramatically less views.
翻译:最近,数据驱动的单一观点重建方法在3D穿衣人的模型制作方面取得了巨大进展,然而,这种方法在深度模糊和单个观点投入所固有的封闭性方面却深受其害。在本文件中,我们通过考虑一小套投入观点和调查从这些观点中适当利用信息的最佳战略来解决这一问题。我们建议了一种数据驱动的端对端方法,从稀疏的相机角度重建穿衣人隐含的3D代表面。具体地说,我们引入了三个关键组成部分:首先,空间一致的重建,允许将个人任意置于使用视角相机模型的输入观点中;第二,基于关注的聚合层,从多个观点中学习汇总视觉信息;第三,一个在多视角背景下编码本地3D模式的机制。在实验中,我们展示了拟议的方法在定量和定性上超越了标准数据方面的艺术状态。为了展示空间一致性的重建,我们将我们的方法应用于动态场景。此外,我们将我们的方法应用于以多镜头平台获得的真实数据,并展示我们的方法可以与远小于多视距的图像取得的结果。