动态环境中可扩展三维场景重建的自适应关键帧选择方法 (Adaptive Keyframe Selection for Scalable 3D Scene Reconstruction in Dynamic Environments)

In this paper, we propose an adaptive keyframe selection method for improved 3D scene reconstruction in dynamic environments. The proposed method integrates two complementary modules: an error-based selection module utilizing photometric and structural similarity (SSIM) errors, and a momentum-based update module that dynamically adjusts keyframe selection thresholds according to scene motion dynamics. By dynamically curating the most informative frames, our approach addresses a key data bottleneck in real-time perception. This allows for the creation of high-quality 3D world representations from a compressed data stream, a critical step towards scalable robot learning and deployment in complex, dynamic environments. Experimental results demonstrate significant improvements over traditional static keyframe selection strategies, such as fixed temporal intervals or uniform frame skipping. These findings highlight a meaningful advancement toward adaptive perception systems that can dynamically respond to complex and evolving visual scenes. We evaluate our proposed adaptive keyframe selection module on two recent state-of-the-art 3D reconstruction networks, Spann3r and CUT3R, and observe consistent improvements in reconstruction quality across both frameworks. Furthermore, an extensive ablation study confirms the effectiveness of each individual component in our method, underlining their contribution to the overall performance gains.

翻译：本文提出了一种自适应关键帧选择方法，旨在提升动态环境中的三维场景重建效果。该方法整合了两个互补模块：一个基于光度与结构相似性（SSIM）误差的误差选择模块，以及一个动量更新模块，该模块可根据场景运动动态调整关键帧选择阈值。通过动态筛选最具信息量的帧，我们的方法解决了实时感知中的一个关键数据瓶颈，从而能够从压缩数据流中生成高质量的三维世界表征，这是实现复杂动态环境中可扩展机器人学习与部署的关键一步。实验结果表明，相较于传统的静态关键帧选择策略（如固定时间间隔或均匀跳帧），本方法取得了显著改进。这些发现标志着自适应感知系统向能够动态响应复杂且不断演化的视觉场景迈出了重要一步。我们在两种最新的先进三维重建网络Spann3r和CUT3R上评估了所提出的自适应关键帧选择模块，观察到两种框架下重建质量均获得一致提升。此外，深入的消融研究证实了方法中各个独立组件的有效性，阐明了它们对整体性能提升的贡献。