We present a method to perform novel view and time synthesis of dynamic scenes, requiring only a monocular video with known camera poses as input. To do this, we introduce Neural Scene Flow Fields, a new representation that models the dynamic scene as a time-variant continuous function of appearance, geometry, and 3D scene motion. Our representation is optimized through a neural network to fit the observed input views. We show that our representation can be used for complex dynamic scenes, including thin structures, view-dependent effects, and natural degrees of motion. We conduct a number of experiments that demonstrate our approach significantly outperforms recent monocular view synthesis methods, and show qualitative results of space-time view synthesis on a variety of real-world videos.
翻译:我们展示了一种对动态场景进行新颖观景和时间合成的方法,只需要一个有已知摄像头的单向视频作为输入。 为此,我们引入了神经光谱流场,这是一个将动态场景模拟为外观、几何和三维场景运动的一种时变连续功能的新代表。我们通过神经网络优化了我们的形象,以适应观察到的输入视图。我们展示了我们的形象可以用于复杂的动态场景,包括薄结构、视依赖效应和自然运动度。我们进行了一系列实验,展示了我们的方法大大优于最近的单视合成方法,并展示了各种真实世界视频的时视图合成质量结果。