Reasoning the 3D structure of a non-rigid dynamic scene from a single moving camera is an under-constrained problem. Inspired by the remarkable progress of neural radiance fields (NeRFs) in photo-realistic novel view synthesis of static scenes, extensions have been proposed for dynamic settings. These methods heavily rely on neural priors in order to regularize the problem. In this work, we take a step back and reinvestigate how current implementations may entail deleterious effects, including limited expressiveness, entanglement of light and density fields, and sub-optimal motion localization. As a remedy, we advocate for a bridge between classic non-rigid-structure-from-motion (\nrsfm) and NeRF, enabling the well-studied priors of the former to constrain the latter. To this end, we propose a framework that factorizes time and space by formulating a scene as a composition of bandlimited, high-dimensional signals. We demonstrate compelling results across complex dynamic scenes that involve changes in lighting, texture and long-range dynamics.
翻译:将非硬性动态场景的三维结构从一个移动的相机中推理出来,是一个受限制不足的问题。在对静态场景进行摄影和现实的新观点合成的神经光亮场(NERFs)的显著进步的启发下,为动态场景提出了扩展建议。这些方法在很大程度上依赖神经前科来规范问题。在这项工作中,我们向后退一步并重新调查当前执行过程会如何产生有害影响,包括表达性有限、光和密度场的缠绕和亚最佳运动定位。作为一种补救措施,我们主张在经典的非硬性结构的移动(\ nrrsfm)和NERF之间架起桥梁,使经过仔细研究的前者的前身能够约束后者。为此,我们提出了一个框架,通过将场景形成带宽、高维度信号来将时间和空间成一个因素。我们展示了涉及照明、文字和长距离动态变化的复杂动态场景的令人信服的结果。</s>