We introduce Structured 3D Features, a model based on a novel implicit 3D representation that pools pixel-aligned image features onto dense 3D points sampled from a parametric, statistical human mesh surface. The 3D points have associated semantics and can move freely in 3D space. This allows for optimal coverage of the person of interest, beyond just the body shape, which in turn, additionally helps modeling accessories, hair, and loose clothing. Owing to this, we present a complete 3D transformer-based attention framework which, given a single image of a person in an unconstrained pose, generates an animatable 3D reconstruction with albedo and illumination decomposition, as a result of a single end-to-end model, trained semi-supervised, and with no additional postprocessing. We show that our S3F model surpasses the previous state-of-the-art on various tasks, including monocular 3D reconstruction, as well as albedo and shading estimation. Moreover, we show that the proposed methodology allows novel view synthesis, relighting, and re-posing the reconstruction, and can naturally be extended to handle multiple input images (e.g. different views of a person, or the same view, in different poses, in video). Finally, we demonstrate the editing capabilities of our model for 3D virtual try-on applications.
翻译:我们引入了结构化的 3D 特征, 这是一种基于新颖的隐含 3D 代表的模型, 将像素结盟图像特征汇集到从参数、 统计人类网状表面取样的密度 3D 点上。 3D 点具有相关的语义学, 可以在 3D 空间自由移动。 这允许最佳覆盖感兴趣的人, 超越身体形状, 这反过来又有助于建模附件、 发型和衣着松散。 因此, 我们展示了一个完整的 3D 变异器关注框架, 以未受控制姿势的一个人的单一图像为基础, 生成了可想象的 3D 重建的3D 3D 点, 其结果是一个单一的端到端模型, 经过训练的半监视, 没有额外的后处理。 我们的S3F 模型超越了先前关于各种任务的状态, 包括单层 3D 模型的重建, 以及反光化和阴影估计。 此外, 我们展示的拟议方法 允许以新视角合成、 亮的、 和光化的3D 重建、 和重新定位的3 图像的图像, 最终显示我们不同的图像 的图像 。