We introduce Structured 3D Features, a model based on a novel implicit 3D representation that pools pixel-aligned image features onto dense 3D points sampled from a parametric, statistical human mesh surface. The 3D points have associated semantics and can move freely in 3D space. This allows for optimal coverage of the person of interest, beyond just the body shape, which in turn, additionally helps modeling accessories, hair, and loose clothing. Owing to this, we present a complete 3D transformer-based attention framework which, given a single image of a person in an unconstrained pose, generates an animatable 3D reconstruction with albedo and illumination decomposition, as a result of a single end-to-end model, trained semi-supervised, and with no additional postprocessing. We show that our S3F model surpasses the previous state-of-the-art on various tasks, including monocular 3D reconstruction, as well as albedo and shading estimation. Moreover, we show that the proposed methodology allows novel view synthesis, relighting, and re-posing the reconstruction, and can naturally be extended to handle multiple input images (e.g. different views of a person, or the same view, in different poses, in video). Finally, we demonstrate the editing capabilities of our model for 3D virtual try-on applications.
翻译:我们提出了结构化三维特征模型,该模型基于一种新颖的隐式三维表示,将像素对齐的图像特征汇集到从参数化统计人类网格表面上抽样的密集三维点上。这些三维点具有相关语义并可以自由移动于三维空间中。这允许对感兴趣的人物进行最佳覆盖,超出了仅仅人体形状的范畴,这还进一步有助于建模饰品、头发和宽松服装。因此,我们提出了一种完整的三维变换器注意力框架,它在给定单张不受约束姿势下的人物图像的情况下,通过单个端到端模型呈现出可动态动画化的三维重建模型,同时进行由半监督学习训练,无需额外后处理即可进行反照率和照明分解。我们展示了我们的S3F模型在各种任务上超越了以前的最好水平,包括单视图3D重建,以及反照率和阴影估计。此外,我们展示了所提出的方法允许新视角综合,重新调整照明和姿势,自然地扩展到处理多个输入图像(例如人物的不同视角还是相同视角在不同姿势中的视频)。最后,我们展示了我们的模型在3D虚拟试穿应用中的编辑能力。