Temporal interpolation often plays a crucial role to learn meaningful representations in dynamic scenes. In this paper, we propose a novel method to train spatiotemporal neural radiance fields of dynamic scenes based on temporal interpolation of feature vectors. Two feature interpolation methods are suggested depending on underlying representations, neural networks or grids. In the neural representation, we extract features from space-time inputs via multiple neural network modules and interpolate them based on time frames. The proposed multi-level feature interpolation network effectively captures features of both short-term and long-term time ranges. In the grid representation, space-time features are learned via four-dimensional hash grids, which remarkably reduces training time. The grid representation shows more than 100 times faster training speed than the previous neural-net-based methods while maintaining the rendering quality. Concatenating static and dynamic features and adding a simple smoothness term further improve the performance of our proposed models. Despite the simplicity of the model architectures, our method achieved state-of-the-art performance both in rendering quality for the neural representation and in training speed for the grid representation.
翻译:时间插值在学习动态场景中的有意义表示中起到关键作用。在本文中,我们提出了一种新的方法,基于特征向量的时间插值,来训练动态场景的时空神经辐射场。根据底层表示的不同,我们建议两种特征插值方法,神经网络或者网格。在神经表示中,我们通过多个神经网络模块从时空输入中提取特征,并根据时间帧进行插值。所提出的多层特征插值网络有效地捕捉了短期和长期时间范围的特征。在网格表示中,我们通过四维哈希网格学习时空特征,大大减少了训练时间。网格表示展示了比以前的基于神经网络的方法快100个以上的训练速度,同时保持了渲染质量。通过连接静态和动态特征并添加简单的平滑度项,进一步提高了我们所提出模型的性能。尽管模型架构简单,但我们的方法在神经表示的渲染质量和网格表示的训练速度方面均取得了最先进的表现。