This paper tackles the problem of novel view synthesis (NVS) from 2D images without known camera poses and intrinsics. Among various NVS techniques, Neural Radiance Field (NeRF) has recently gained popularity due to its remarkable synthesis quality. Existing NeRF-based approaches assume that the camera parameters associated with each input image are either directly accessible at training, or can be accurately estimated with conventional techniques based on correspondences, such as Structure-from-Motion. In this work, we propose an end-to-end framework, termed NeRF--, for training NeRF models given only RGB images, without pre-computed camera parameters. Specifically, we show that the camera parameters, including both intrinsics and extrinsics, can be automatically discovered via joint optimisation during the training of the NeRF model. On the standard LLFF benchmark, our model achieves comparable novel view synthesis results compared to the baseline trained with COLMAP pre-computed camera parameters. We also conduct extensive analyses to understand the model behaviour under different camera trajectories, and show that in scenarios where COLMAP fails, our model still produces robust results.
翻译:本文针对的是2D图象中未经已知照相机配置和内设的2D图象的新视角合成(NVS)问题。在各种NVS技术中,神经辐射场(NeRF)最近因其惊人的合成质量而越来越受欢迎。现有的NeRF方法假定,与每种输入图象相关的相机参数在培训时可以直接获得,或者可以用基于通信的常规技术(如结构-从运动等)准确估计。在这项工作中,我们提议了一个终端到终端的框架,称为NeRF-,用于培训仅提供RGB图象的NERF模型,而没有预先计算相机参数。具体地说,我们表明,在NERF模型培训期间,通过联合优化可以自动发现包括内设和外设的相机参数。在标准LLFF基准上,我们的模型取得了与CMAP预配制相机参数所培训的基线相近的新的综合结果。我们还进行了广泛的分析,以了解不同相机截射线下的模型行为,并显示在COLMAPA失败的情况下,我们的模型仍然产生可靠的结果。