We present DINAR, an approach for creating realistic rigged fullbody avatars from single RGB images. Similarly to previous works, our method uses neural textures combined with the SMPL-X body model to achieve photo-realistic quality of avatars while keeping them easy to animate and fast to infer. To restore the texture, we use a latent diffusion model and show how such model can be trained in the neural texture space. The use of the diffusion model allows us to realistically reconstruct large unseen regions such as the back of a person given the frontal view. The models in our pipeline are trained using 2D images and videos only. In the experiments, our approach achieves state-of-the-art rendering quality and good generalization to new poses and viewpoints. In particular, the approach improves state-of-the-art on the SnapshotPeople public benchmark.
翻译:我们提出了DINAR,一种从单个RGB图像创建逼真绑定的全身人型角色的方法。与先前的工作类似,我们的方法使用神经纹理结合SMPL-X身体模型,以实现逼真的人型角色质量,同时保持它们易于动画化和快速推理。为了恢复纹理,我们使用潜在扩散模型,并展示了如何在神经纹理空间中训练这样的模型。扩散模型的使用使我们能够逼真地重建大面积不可见区域,例如给定正面视角的人的背部。我们的管道中的模型仅使用2D图像和视频进行训练。在实验中,我们的方法实现了最先进的渲染质量,并良好地推广到新的姿势和视点。特别是,该方法改善了SnapshotPeople公共基准的最新技术水平。