Recently, Gaussian Splatting (GS) has shown great potential for urban scene reconstruction in the field of autonomous driving. However, current urban scene reconstruction methods often depend on multimodal sensors as inputs, \textit{i.e.} LiDAR and images. Though the geometry prior provided by LiDAR point clouds can largely mitigate ill-posedness in reconstruction, acquiring such accurate LiDAR data is still challenging in practice: i) precise spatiotemporal calibration between LiDAR and other sensors is required, as they may not capture data simultaneously; ii) reprojection errors arise from spatial misalignment when LiDAR and cameras are mounted at different locations. To avoid the difficulty of acquiring accurate LiDAR depth, we propose D$^2$GS, a LiDAR-free urban scene reconstruction framework. In this work, we obtain geometry priors that are as effective as LiDAR while being denser and more accurate. $\textbf{First}$, we initialize a dense point cloud by back-projecting multi-view metric depth predictions. This point cloud is then optimized by a Progressive Pruning strategy to improve the global consistency. $\textbf{Second}$, we jointly refine Gaussian geometry and predicted dense metric depth via a Depth Enhancer. Specifically, we leverage diffusion priors from a depth foundation model to enhance the depth maps rendered by Gaussians. In turn, the enhanced depths provide stronger geometric constraints during Gaussian training. $\textbf{Finally}$, we improve the accuracy of ground geometry by constraining the shape and normal attributes of Gaussians within road regions. Extensive experiments on the Waymo dataset demonstrate that our method consistently outperforms state-of-the-art methods, producing more accurate geometry even when compared with those using ground-truth LiDAR data.


翻译:近年来,高斯泼溅(Gaussian Splatting, GS)在自动驾驶领域的城市场景重建中展现出巨大潜力。然而,当前的城市场景重建方法通常依赖于多模态传感器作为输入,即激光雷达(LiDAR)与图像。尽管激光雷达点云提供的几何先验能大幅缓解重建中的不适定性,但在实践中获取此类精确的激光雷达数据仍面临挑战:i) 激光雷达与其他传感器之间需要精确的时空标定,因为它们可能无法同步采集数据;ii) 当激光雷达与相机安装在不同位置时,空间未对准会导致重投影误差。为规避获取精确激光雷达深度的困难,我们提出了D$^2$GS,一种无需激光雷达的城市场景重建框架。在本工作中,我们获得了与激光雷达同等有效、且更密集、更精确的几何先验。\textbf{首先},我们通过反向投影多视角度量深度预测来初始化密集点云,随后通过渐进式剪枝策略优化该点云以提升全局一致性。\textbf{其次},我们通过深度增强器联合优化高斯几何与预测的密集度量深度。具体而言,我们利用来自深度基础模型的扩散先验来增强高斯泼溅渲染的深度图。反过来,增强后的深度在高斯训练过程中提供了更强的几何约束。\textbf{最后},我们通过约束道路区域内高斯的形状与法向属性,提升了地面几何的精度。在Waymo数据集上的大量实验表明,我们的方法始终优于现有最先进方法,即使与使用真实激光雷达数据的方法相比,也能生成更精确的几何结构。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员