Multi-resolution hash encoding has recently been proposed to reduce the computational cost of neural renderings, such as NeRF. This method requires accurate camera poses for the neural renderings of given scenes. However, contrary to previous methods jointly optimizing camera poses and 3D scenes, the naive gradient-based camera pose refinement method using multi-resolution hash encoding severely deteriorates performance. We propose a joint optimization algorithm to calibrate the camera pose and learn a geometric representation using efficient multi-resolution hash encoding. Showing that the oscillating gradient flows of hash encoding interfere with the registration of camera poses, our method addresses the issue by utilizing smooth interpolation weighting to stabilize the gradient oscillation for the ray samplings across hash grids. Moreover, the curriculum training procedure helps to learn the level-wise hash encoding, further increasing the pose refinement. Experiments on the novel-view synthesis datasets validate that our learning frameworks achieve state-of-the-art performance and rapid convergence of neural rendering, even when initial camera poses are unknown.
翻译:最近有人提议采用多分辨率散射编码(如NeRF)等多分辨率散射编码来降低神经变形的计算成本。 这种方法需要为特定场景的神经变形提供精确的相机。 但是,与以前共同优化相机变形和3D场景的方法相反,天真的梯度照相机则采用多分辨率散射编码的精细方法,使性能严重恶化。 我们提议采用联合优化算法来校准相机的成像,并利用高效多分辨率散射编码来学习几何表示法。 显示散射编码的振动梯度流干扰了相机变形的登记,我们的方法是利用平滑的内推法来解决这个问题,用平滑的内推法来稳定散射镜取样的梯度振动。 此外,课程培训程序有助于学习高分辨率散射线编码,进一步提高变形变形精度。 在新观点合成数据集上进行的实验证实,我们的学习框架取得了最先进的性能和神经变形的快速融合,即使最初的摄像器还不清楚。