Dynamic scene rendering and reconstruction play a crucial role in computer vision and augmented reality. Recent methods based on 3D Gaussian Splatting (3DGS), have enabled accurate modeling of dynamic urban scenes, but for urban scenes they require both camera and LiDAR data, ground-truth 3D segmentations and motion data in the form of tracklets or pre-defined object templates such as SMPL. In this work, we explore whether a combination of 2D object agnostic priors in the form of depth and point tracking coupled with a signed distance function (SDF) representation for dynamic objects can be used to relax some of these requirements. We present a novel approach that integrates Signed Distance Functions (SDFs) with 3D Gaussian Splatting (3DGS) to create a more robust object representation by harnessing the strengths of both methods. Our unified optimization framework enhances the geometric accuracy of 3D Gaussian splatting and improves deformation modeling within the SDF, resulting in a more adaptable and precise representation. We demonstrate that our method achieves state-of-the-art performance in rendering metrics even without LiDAR data on urban scenes. When incorporating LiDAR, our approach improved further in reconstructing and generating novel views across diverse object categories, without ground-truth 3D motion annotation. Additionally, our method enables various scene editing tasks, including scene decomposition, and scene composition.
翻译:动态场景渲染与重建在计算机视觉和增强现实领域具有关键作用。基于3D高斯泼溅(3DGS)的最新方法已能精确建模动态城市场景,但对于城市场景,这些方法需要同时使用相机与激光雷达数据、真实三维分割数据以及轨迹片段或预定义对象模板(如SMPL)形式的运动数据。本研究探讨了结合深度与点跟踪形式的二维对象无关先验,以及针对动态对象的符号距离函数(SDF)表示,是否能够放宽部分数据要求。我们提出了一种创新方法,将符号距离函数(SDF)与3D高斯泼溅(3DGS)相结合,通过融合两种方法的优势构建更鲁棒的对象表示。我们的统一优化框架提升了3D高斯泼溅的几何精度,并改进了SDF内部的形变建模,从而获得适应性更强、精度更高的场景表示。实验表明,即使在城市场景中不使用激光雷达数据,我们的方法在渲染指标上仍能达到最先进的性能。当引入激光雷达数据时,本方法在重建和生成多类别对象的新视角方面表现进一步提升,且无需真实三维运动标注。此外,该方法支持多种场景编辑任务,包括场景解构与场景重组。