Learning editable high-resolution scene representations for dynamic scenes is an open problem with applications across the domains from autonomous driving to creative editing - the most successful approaches today make a trade-off between editability and supporting scene complexity: neural atlases represent dynamic scenes as two deforming image layers, foreground and background, which are editable in 2D, but break down when multiple objects occlude and interact. In contrast, scene graph models make use of annotated data such as masks and bounding boxes from autonomous-driving datasets to capture complex 3D spatial relationships, but their implicit volumetric node representations are challenging to edit view-consistently. We propose Neural Atlas Graphs (NAGs), a hybrid high-resolution scene representation, where every graph node is a view-dependent neural atlas, facilitating both 2D appearance editing and 3D ordering and positioning of scene elements. Fit at test-time, NAGs achieve state-of-the-art quantitative results on the Waymo Open Dataset - by 5 dB PSNR increase compared to existing methods - and make environmental editing possible in high resolution and visual quality - creating counterfactual driving scenarios with new backgrounds and edited vehicle appearance. We find that the method also generalizes beyond driving scenes and compares favorably - by more than 7 dB in PSNR - to recent matting and video editing baselines on the DAVIS video dataset with a diverse set of human and animal-centric scenes. Project Page: https://princeton-computational-imaging.github.io/nag/
翻译:为动态场景学习可编辑的高分辨率场景表示是一个开放性问题,其应用领域涵盖从自动驾驶到创意编辑等多个方面。当前最成功的方法在可编辑性与支持场景复杂性之间做出了权衡:神经图谱将动态场景表示为两个形变的图像层(前景与背景),可在二维空间中进行编辑,但当多个物体发生遮挡与交互时,该方法会失效。相比之下,场景图模型利用自动驾驶数据集中的标注数据(如掩码与边界框)来捕捉复杂的三维空间关系,但其隐式体素节点表示难以实现视角一致性的编辑。我们提出了神经图谱图(NAGs),一种混合式高分辨率场景表示方法,其中每个图节点均为一个视角依赖的神经图谱,既支持二维外观编辑,又实现了场景元素的三维排序与定位。通过在测试时进行拟合,NAGs在Waymo开放数据集上取得了最先进的定量结果——相较于现有方法,PSNR提升了5 dB——并实现了高分辨率与视觉质量下的环境编辑,例如创建具有新背景和编辑后车辆外观的反事实驾驶场景。我们发现,该方法还泛化至驾驶场景之外,在DAVIS视频数据集上,针对以人类和动物为中心的多样化场景,其PSNR较近期的抠图与视频编辑基线方法提升了超过7 dB。项目页面:https://princeton-computational-imaging.github.io/nag/