We propose a novel approach for 3D video synthesis that is able to represent multi-view video recordings of a dynamic real-world scene in a compact, yet expressive representation that enables high-quality view synthesis and motion interpolation. Our approach takes the high quality and compactness of static neural radiance fields in a new direction: to a model-free, dynamic setting. At the core of our approach is a novel time-conditioned neural radiance field that represents scene dynamics using a set of compact latent codes. We are able to significantly boost the training speed and perceptual quality of the generated imagery by a novel hierarchical training scheme in combination with ray importance sampling. Our learned representation is highly compact and able to represent a 10 second 30 FPS multiview video recording by 18 cameras with a model size of only 28MB. We demonstrate that our method can render high-fidelity wide-angle novel views at over 1K resolution, even for complex and dynamic scenes. We perform an extensive qualitative and quantitative evaluation that shows that our approach outperforms the state of the art. Project website: https://neural-3d-video.github.io/.
翻译:我们提出一种新型的3D视频合成方法,该方法能够在一个紧凑的、但能表达的表达式中代表动态真实世界场景的多视角视频记录,从而能够进行高质量的视图合成和动态内插。我们的方法将静态神经光亮场的高质量和紧凑性引入一个新的方向:一个没有模型的动态环境。我们的方法的核心是一个新的、有时间条件的神经光亮场,它使用一套紧凑的潜在代码代表着场景动态。我们能够大大提高新颖的分级培训计划所产生的图像的培训速度和感知质量,同时结合光学重要性取样。我们所学的表述非常紧凑,能够代表由18个摄像头录制的10秒30FPS多视角视频,其模型尺寸只有28MB。我们证明,我们的方法可以使1K分辨率上的高纤维宽度的新观点,即使是复杂和动态的场景象。我们进行了广泛的定性和定量评价,表明我们的方法超越了艺术状态。项目网站:https://neural-3d-vical.githubio/。