We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene. State-of-the-art methods based on temporally varying Neural Radiance Fields (aka dynamic NeRFs) have shown impressive results on this task. However, for long videos with complex object motions and uncontrolled camera trajectories, these methods can produce blurry or inaccurate renderings, hampering their use in real-world applications. Instead of encoding the entire dynamic scene within the weights of an MLP, we present a new approach that addresses these limitations by adopting a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views in a scene-motion-aware manner. Our system retains the advantages of prior methods in its ability to model complex scenes and view-dependent effects, but also enables synthesizing photo-realistic novel views from long videos featuring complex scene dynamics with unconstrained camera trajectories. We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets, and also apply our approach to in-the-wild videos with challenging camera and object motion, where prior methods fail to produce high-quality renderings. Our project webpage is at dynibar.github.io.
翻译:我们处理的是从一个描述复杂动态场景的单镜头视频中合成新观点的问题。 基于时间变化的神经辐射场(aka virent NeRFs)的最新艺术方法展示了令人印象深刻的成果。然而,对于具有复杂物体动作和不受控制的相机轨迹的长片视频来说,这些方法可以产生模糊或不准确的图像,妨碍其在现实世界应用中的应用。我们提出的新办法不是将整个动态场景在MLP的重量范围内编码,而是通过采用一个基于体积图像的合成框架来解决这些局限性。这个基于体积图像的合成框架,通过以场景感知方式将附近视图的特征汇总起来。我们的系统保留了先前方法的优势,它能够模拟复杂场景和视貌独立效应,但也能够将具有复杂场景动态的长视频中的照片现实新观点与未经调控的相机轨迹进行合成。我们展示了动态场景数据集上的最新设计方法的显著改进,并且还应用了我们的方法来综合新观点,通过场景场景场景场景场景场景场景场景场景集集集集集集集集集集集集集集集集集集集集集中的功能集集集中的功能,从而产生挑战性地制成高水平的图像质量的图像,从而产生高难度,从而产生高难度的图像质量的图像图象。