Recent works in spatiotemporal radiance fields can produce photorealistic free-viewpoint videos. However, they are inherently unsuitable for interactive streaming scenarios (e.g. video conferencing, telepresence) because have an inevitable lag even if the training is instantaneous. This is because these approaches consume videos and thus have to buffer chunks of frames (often seconds) before processing. In this work, we take a step towards interactive streaming via a frame-by-frame approach naturally free of lag. Conventional wisdom believes that per-frame NeRFs are impractical due to prohibitive training costs and storage. We break this belief by introducing Incremental Neural Videos (INV), a per-frame NeRF that is efficiently trained and streamable. We designed INV based on two insights: (1) Our main finding is that MLPs naturally partition themselves into Structure and Color Layers, which store structural and color/texture information respectively. (2) We leverage this property to retain and improve upon knowledge from previous frames, thus amortizing training across frames and reducing redundant learning. As a result, with negligible changes to NeRF, INV can achieve good qualities (>28.6db) in 8min/frame. It can also outperform prior SOTA in 19% less training time. Additionally, our Temporal Weight Compression reduces the per-frame size to 0.3MB/frame (6.6% of NeRF). More importantly, INV is free from buffer lag and is naturally fit for streaming. While this work does not achieve real-time training, it shows that incremental approaches like INV present new possibilities in interactive 3D streaming. Moreover, our discovery of natural information partition leads to a better understanding and manipulation of MLPs. Code and dataset will be released soon.
翻译:微小时空光亮场最近的作品可以产生光现实自由视野视频。 然而, 这些作品本质上不适合互动流景情景( 视频会议、 远程现场), 因为即使培训是即时的, 也不可避免地滞后。 这是因为这些方法消耗视频, 因而在处理前要缓冲框架块( 通常秒) 。 在这项工作中, 我们采取一个步骤, 通过一个框架- 框架方法实现互动流流流, 自然没有滞后。 传统智慧认为, 单框架 NERF 之所以不切实际, 是因为培训成本和存储过高。 我们通过引入递增神经视频( INV) 打破了这一信念, 因为它是一个互动流 NERF, 是一个高效培训和可流流。 我们根据两个洞察:(1) MNVP 自然将自己自然分割到结构( 通常秒) 和 颜色/ 图层图中。 (2) 我们利用这个属性来保存和完善先前框架的知识, 从而将跨框架的训练与多余的学习方法。 结果是, 与 NERF 相比, 直流流的 Reval- 直流的 Reval- dreal- dreal- dreal 将比值比值比值比值更低。