Recent implicit neural rendering methods have demonstrated that it is possible to learn accurate view synthesis for complex scenes by predicting their volumetric density and color supervised solely by a set of RGB images. However, existing methods are restricted to learning efficient representations of static scenes that encode all scene objects into a single neural network, and lack the ability to represent dynamic scenes and decompositions into individual scene objects. In this work, we present the first neural rendering method that decomposes dynamic scenes into scene graphs. We propose a learned scene graph representation, which encodes object transformation and radiance, to efficiently render novel arrangements and views of the scene. To this end, we learn implicitly encoded scenes, combined with a jointly learned latent representation to describe objects with a single implicit function. We assess the proposed method on synthetic and real automotive data, validating that our approach learns dynamic scenes -- only by observing a video of this scene -- and allows for rendering novel photo-realistic views of novel scene compositions with unseen sets of objects at unseen poses.
翻译:最近隐含的神经转换方法表明,通过预测其体积密度和颜色,完全由一组RGB图像监督,可以对复杂场景进行准确的视觉合成。然而,现有方法仅限于学习将所有场景物体编码成单一神经网络的静态场景的有效展示,缺乏在单个场景物体中显示动态场景和分解的能力。在这项工作中,我们展示了第一个将动态场景分解成场景的神经合成方法。我们提议了一种将物体变形和亮度编码起来的富有学识的场景图示,以有效地对场景做出新的安排和视图。为此,我们学习了隐含编码的场景,同时学习了共同学习的潜伏场景,以单一的隐含功能描述物体。我们评估了合成和真实汽车数据的拟议方法,确认我们的方法学习动态场景 -- -- 只能通过观察这一场景的视频 -- -- 并允许对以看不见的物体组合对新场景进行新的摄影-现实化观点。