There have been significant advancements in dynamic novel view synthesis in recent years. However, current deep learning models often require (1) prior models (e.g., SMPL human models), (2) heavy pre-processing, or (3) per-scene optimization. We propose to utilize RGBD cameras to remove these limitations and synthesize free-viewpoint videos of dynamic indoor scenes. We generate feature point clouds from RGBD frames and then render them into free-viewpoint videos via a neural renderer. However, the inaccurate, unstable, and incomplete depth measurements induce severe distortions, flickering, and ghosting artifacts. We enforce spatial-temporal consistency via the proposed Cycle Reconstruction Consistency and Temporal Stabilization module to reduce these artifacts. We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views. Additionally, we present a Human-Things Interactions dataset to validate our approach and facilitate future research. The dataset consists of 43 multi-view RGBD video sequences of everyday activities, capturing complex interactions between human subjects and their surroundings. Experiments on the HTI dataset show that our method outperforms the baseline per-frame image fidelity and spatial-temporal consistency. We will release our code, and the dataset on the website soon.
翻译:近些年来,动态的新观点合成工作取得了显著进展,然而,目前的深层次学习模式往往需要(1) 以前的模型(如SMPL人造模型)、(2) 重型预处理或(3) 每层优化。我们提议使用RGBD相机来消除这些限制,并合成动态室内场景的免费视频。我们从RGBD框架中生成了特点云,然后通过神经造影器将其制成自由视点视频。然而,不准确、不稳定和不完全的深度测量导致严重扭曲、闪烁和幽灵文物。我们通过拟议的周期重建常识和时空稳定模块强制执行空间时空一致性,以减少这些文物。我们建议使用简单的区域深度摄像头来消除这些限制,并合成动态室内场景的免费浏览点视频。此外,我们展示了人类-带互动数据集,以验证我们的方法,便利今后的研究。数据集包括43个多视图 RGBD视频序列,捕捉人类主体及其周围的复杂互动关系。我们引入了一个简单的区域深度插图模模模模模型,我们每个网站将展示出我们的真实性基线数据。