We present DINAR, an approach for creating realistic rigged fullbody avatars from single RGB images. Similarly to previous works, our method uses neural textures combined with the SMPL-X body model to achieve photo-realistic quality of avatars while keeping them easy to animate and fast to infer. To restore the texture, we use a latent diffusion model and show how such model can be trained in the neural texture space. The use of the diffusion model allows us to realistically reconstruct large unseen regions such as the back of a person given the frontal view. The models in our pipeline are trained using 2D images and videos only. In the experiments, our approach achieves state-of-the-art rendering quality and good generalization to new poses and viewpoints. In particular, the approach improves state-of-the-art on the SnapshotPeople public benchmark.
翻译:我们展示了DINAR, 这是一种从单一 RGB 图像中创建现实操作的全体体变形器的方法。 与以往的作品一样, 我们的方法使用神经质质与 SMPL- X 体模型相结合, 以实现变形体的光现实质量, 同时让它们容易动动和快速推导。 为了恢复纹理, 我们使用一种潜伏的传播模型, 并展示如何在神经质地上培训这种模型。 使用扩散模型, 使我们能够现实地重建大面积的不可见区域, 比如被直观看的人的背部。 我们管道中的模型只用 2D 图像和视频来训练。 在实验中, 我们的方法实现了艺术状态, 使新面貌和观点质量和良好的概括化。 特别是, 我们的方法改进了快照人公共基准上的艺术状态 。</s>