We present a method, Neural Radiance Flow (NeRFlow),to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images. Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene. By enforcing consistency across different modalities, our representation enables multi-view rendering in diverse dynamic scenes, including water pouring, robotic interaction, and real images, outperforming state-of-the-art methods for spatial-temporal view synthesis. Our approach works even when inputs images are captured with only one camera. We further demonstrate that the learned representation can serve as an implicit scene prior, enabling video processing tasks such as image super-resolution and de-noising without any additional supervision.
翻译:我们展示了一种方法,即神经辐射流(NeRFlow),从一组 RGB 图像中学习动态场景的四维空间时空代表。 我们方法的关键是使用神经隐含代表,学会捕捉三维占用、光亮和场景动态。 通过在不同模式中加强一致性,我们的演示能够在不同动态场景(包括水浇注、机器人互动和真实图像)中进行多视角显示,在空间时空视图合成方面,我们的方法优于最先进的方法。 即使输入图像只用一台相机摄取,我们的方法仍然有效。 我们还进一步证明,所学的表达可以作为隐含的场景,使图像超分辨率和无任何额外监督的除尘等视频处理任务成为先隐含的场景。