Image-based volumetric humans using pixel-aligned features promise generalization to unseen poses and identities. Prior work leverages global spatial encodings and multi-view geometric consistency to reduce spatial ambiguity. However, global encodings often suffer from overfitting to the distribution of the training data, and it is difficult to learn multi-view consistent reconstruction from sparse views. In this work, we investigate common issues with existing spatial encodings and propose a simple yet highly effective approach to modeling high-fidelity volumetric humans from sparse views. One of the key ideas is to encode relative spatial 3D information via sparse 3D keypoints. This approach is robust to the sparsity of viewpoints and cross-dataset domain gap. Our approach outperforms state-of-the-art methods for head reconstruction. On human body reconstruction for unseen subjects, we also achieve performance comparable to prior work that uses a parametric human body model and temporal feature aggregation. Our experiments show that a majority of errors in prior work stem from an inappropriate choice of spatial encoding and thus we suggest a new direction for high-fidelity image-based human modeling. https://markomih.github.io/KeypointNeRF
翻译:在这项工作中,我们调查现有空间编码的常见问题,并提议一种简单而非常有效的方法,从稀少的视角模拟高不端体积人。关键思想之一是通过稀疏的 3D 关键点编码相对空间 3D 信息。这个方法对观点的广度和交叉数据域间差距十分有力。我们的方法优于最先进的头部重建方法。关于人类身体对隐形主题的重建,我们还取得了与以前工作相比的绩效,以前的工作使用了参数人体模型和时间特征汇总。我们的实验表明,先前工作中的大多数错误来自空间编码的不当选择,因此我们建议了基于高不端图像的人类模型的新方向。 https://markeyKomimogimation. https://Nemaimogimogimo.