Neural networks can represent and accurately reconstruct radiance fields for static 3D scenes (e.g., NeRF). Several works extend these to dynamic scenes captured with monocular video, with promising performance. However, the monocular setting is known to be an under-constrained problem, and so methods rely on data-driven priors for reconstructing dynamic content. We replace these priors with measurements from a time-of-flight (ToF) camera, and introduce a neural representation based on an image formation model for continuous-wave ToF cameras. Instead of working with processed depth maps, we model the raw ToF sensor measurements to improve reconstruction quality and avoid issues with low reflectance regions, multi-path interference, and a sensor's limited unambiguous depth range. We show that this approach improves robustness of dynamic scene reconstruction to erroneous calibration and large motions, and discuss the benefits and limitations of integrating RGB+ToF sensors that are now available on modern smartphones.
翻译:神经网络可以代表并准确地重建静态 3D 场景的光场(如 NERF ) 。 一些工程将这些光场扩展至以单视视视频拍摄的动态场景,并具有良好的性能。 然而,单视场景已知是一个受控制不足的问题,因此方法依靠数据驱动的先期来重建动态内容。 我们用飞行时间( ToF) 相机的测量来取代这些前期。 我们引入基于连续波向F 摄像头图像形成模型的神经代表。 我们不使用经过处理的深度地图,而是模拟原始至F 传感器测量,以提高重建质量,避免低反射区域、多路干扰和传感器的清晰深度范围。 我们表明,这种方法可以提高动态场重建的稳健性,以校准错误和大动作,并讨论目前现代智能手机上可用的 RGB+ ToF 传感器的整合效益和局限性。