We present an algorithm for generating novel views at arbitrary viewpoints and any input time step given a monocular video of a dynamic scene. Our work builds upon recent advances in neural implicit representation and uses continuous and differentiable functions for modeling the time-varying structure and the appearance of the scene. We jointly train a time-invariant static NeRF and a time-varying dynamic NeRF, and learn how to blend the results in an unsupervised manner. However, learning this implicit function from a single video is highly ill-posed (with infinitely many solutions that match the input video). To resolve the ambiguity, we introduce regularization losses to encourage a more physically plausible solution. We show extensive quantitative and qualitative results of dynamic view synthesis from casually captured videos.
翻译:我们用任意观点和输入时间步骤生成新观点的算法,给一个动态场景的单向视频提供单向视频。我们的工作以神经隐含表达方式的最新进展为基础,并使用连续和不同的功能来模拟时间变化结构和场景的外观。我们联合培训一个时间变化的静态NERF和一个时间变化的动态NERF,并学习如何以不受监督的方式将结果混合在一起。然而,从一个视频中学习这一隐含的功能是非常错误的(与输入视频相匹配的绝大多数解决方案 ) 。为了解决这种模糊性,我们引入了正规化损失,以鼓励更实际可行的解决方案。我们展示了从随意拍摄的视频中动态视图合成的广泛数量和质量结果。