Visual SLAM is a cornerstone technique in robotics, autonomous driving and extended reality (XR), yet classical systems often struggle with low-texture environments, scale ambiguity, and degraded performance under challenging visual conditions. Recent advancements in feed-forward neural network-based pointmap regression have demonstrated the potential to recover high-fidelity 3D scene geometry directly from images, leveraging learned spatial priors to overcome limitations of traditional multi-view geometry methods. However, the widely validated advantages of probabilistic multi-sensor information fusion are often discarded in these pipelines. In this work, we propose MASt3R-Fusion,a multi-sensor-assisted visual SLAM framework that tightly integrates feed-forward pointmap regression with complementary sensor information, including inertial measurements and GNSS data. The system introduces Sim(3)-based visualalignment constraints (in the Hessian form) into a universal metric-scale SE(3) factor graph for effective information fusion. A hierarchical factor graph design is developed, which allows both real-time sliding-window optimization and global optimization with aggressive loop closures, enabling real-time pose tracking, metric-scale structure perception and globally consistent mapping. We evaluate our approach on both public benchmarks and self-collected datasets, demonstrating substantial improvements in accuracy and robustness over existing visual-centered multi-sensor SLAM systems. The code will be released open-source to support reproducibility and further research (https://github.com/GREAT-WHU/MASt3R-Fusion).
翻译:视觉SLAM是机器人学、自动驾驶与扩展现实(XR)领域的基石技术,然而经典系统常面临低纹理环境、尺度模糊性及恶劣视觉条件下性能退化等挑战。基于前馈神经网络的点云回归最新进展表明,通过利用学习到的空间先验直接从图像恢复高保真三维场景几何,能够克服传统多视图几何方法的局限。然而,这些流程通常舍弃了概率多传感器信息融合的广泛验证优势。本研究提出MASt3R-Fusion——一种多传感器辅助视觉SLAM框架,将前馈点云回归与惯性测量及GNSS数据等互补传感器信息紧密集成。该系统将基于Sim(3)的视觉对齐约束(以Hessian形式)引入通用度量尺度SE(3)因子图,实现高效信息融合。通过设计分层因子图架构,系统同时支持实时滑动窗口优化与具备激进回环检测的全局优化,从而实现实时位姿跟踪、度量尺度结构感知与全局一致建图。我们在公开基准与自采集数据集上评估该方法,结果表明其相较于现有以视觉为中心的多传感器SLAM系统,在精度与鲁棒性方面均有显著提升。代码将以开源形式发布以支持可复现性与后续研究(https://github.com/GREAT-WHU/MASt3R-Fusion)。