Image-based volumetric avatars using pixel-aligned features promise generalization to unseen poses and identities. Prior work leverages global spatial encodings and multi-view geometric consistency to reduce spatial ambiguity. However, global encodings often suffer from overfitting to the distribution of the training data, and it is difficult to learn multi-view consistent reconstruction from sparse views. In this work, we investigate common issues with existing spatial encodings and propose a simple yet highly effective approach to modeling high-fidelity volumetric avatars from sparse views. One of the key ideas is to encode relative spatial 3D information via sparse 3D keypoints. This approach is robust to the sparsity of viewpoints and cross-dataset domain gap. Our approach outperforms state-of-the-art methods for head reconstruction. On human body reconstruction for unseen subjects, we also achieve performance comparable to prior work that uses a parametric human body model and temporal feature aggregation. Our experiments show that a majority of errors in prior work stem from an inappropriate choice of spatial encoding and thus we suggest a new direction for high-fidelity image-based avatar modeling. https://markomih.github.io/KeypointNeRF
翻译:在这项工作中,我们调查现有空间编码的常见问题,并提议一种简单而非常有效的方法,从稀疏的观点中模拟高不忠体积变异体。 关键思想之一是通过稀疏的 3D 关键点编码相对的空间 3D 信息。 这种方法对观点的宽度和交叉数据设定域间的差距十分有力。 我们的方法超越了当前最先进的头部重建方法。 在对看不见主题进行人体重建方面,我们还取得了与以前工作相似的业绩,以前的工作使用了对等人体模型和时间特征汇总。 我们的实验表明,先前工作中的大多数错误来自空间编码的不当选择,因此我们建议了基于高菲德尔图像的模型/数字模型。 http:// http://maintimation/NemaintimationRFrgs。