Human imitation has become topical recently, driven by GAN's ability to disentangle human pose and body content. However, the latest methods hardly focus on 3D information, and to avoid self-occlusion, a massive amount of input images are needed. In this paper, we propose RIN, a novel volume-based framework for reconstructing a textured 3D model from a single picture and imitating a subject with the generated model. Specifically, to estimate most of the human texture, we propose a U-Net-like front-to-back translation network. With both front and back images input, the textured volume recovery module allows us to color a volumetric human. A sequence of 3D poses then guides the colored volume via Flowable Disentangle Networks as a volume-to-volume translation task. To project volumes to a 2D plane during training, we design a differentiable depth-aware renderer. Our experiments demonstrate that our volume-based model is adequate for human imitation, and the back view can be estimated reliably using our network. While prior works based on either 2D pose or semantic map often fail for the unstable appearance of a human, our framework can still produce concrete results, which are competitive to those imagined from multi-view input.
翻译:人类模仿最近成为热门话题,因为GAN能够解开人类表面和身体内容。然而,最新方法几乎不注重3D信息,也避免自我封闭,因此需要大量输入图像。在本文中,我们提议使用新的卷本框架,即“RIN”,用于从单一图片中重建一个纹理的3D模型,并用生成模型模拟一个主题。具体地说,为了估计大部分人类纹理,我们建议了一种U-Net类的前对后翻译网络。用前和后图像输入,文本的体积恢复模块允许我们给一个体积人颜色。3D序列将一个彩色的体积通过可流动解剖网络作为量翻译任务加以指导。为了在培训期间将体积投到2D平面,我们设计了一个不同的深度感知化模型。我们实验表明,我们的体积模型足以模拟人类的模样,而后视图可以可靠地利用我们的网络来估计。虽然以前基于2D面图或语义图的作品仍然在2D面或语义图上,但那些模型往往无法产生不稳定的多面的人类输入结果。