Visually exploring in a real-world 4D spatiotemporal space freely in VR has been a long-term quest. The task is especially appealing when only a few or even single RGB cameras are used for capturing the dynamic scene. To this end, we present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering. First, we propose to decompose the 4D spatiotemporal space according to temporal characteristics. Points in the 4D space are associated with probabilities of belonging to three categories: static, deforming, and new areas. Each area is represented and regularized by a separate neural field. Second, we propose a hybrid representations based feature streaming scheme for efficiently modeling the neural fields. Our approach, coined NeRFPlayer, is evaluated on dynamic scenes captured by single hand-held cameras and multi-camera arrays, achieving comparable or superior rendering performance in terms of quality and speed comparable to recent state-of-the-art methods, achieving reconstruction in 10 seconds per frame and interactive rendering.
翻译:在 VR 中自由在现实世界的 4D 空间中自由进行视觉探索是一项长期的探索。 当只使用少量甚至单一的 RGB 相机来捕捉动态场景时, 任务特别具有吸引力。 为此, 我们提出了一个高效的框架, 能够快速重建、 紧凑的模型建模和可流成像。 首先, 我们提议根据时间特征对 4D 空间进行分解。 4D 空间的点与属于三类的概率相关: 静态、 变形和新区域。 每个区域由单独的神经场代表并规范化。 其次, 我们提出一个基于基于快速模拟神经场的混合地貌流方案。 我们的方法, 由单手持相机和多镜头阵列所捕捉到的动态场进行评估, 在质量和速度方面达到与最近最先进的方法相近或更优越的性能, 实现每框架10 秒的重建, 和交互演化。