Six degree of freedom (6DoF) pose estimation for novel objects is a critical task in computer vision, yet it faces significant challenges in high-speed and low-light scenarios where standard RGB cameras suffer from motion blur. While event cameras offer a promising solution due to their high temporal resolution, current 6DoF pose estimation methods typically yield suboptimal performance in high-speed object moving scenarios. To address this gap, we propose PoseStreamer, a robust multi-modal 6DoF pose estimation framework designed specifically on high-speed moving scenarios. Our approach integrates three core components: an Adaptive Pose Memory Queue that utilizes historical orientation cues for temporal consistency, an Object-centric 2D Tracker that provides strong 2D priors to boost 3D center recall, and a Ray Pose Filter for geometric refinement along camera rays. Furthermore, we introduce MoCapCube6D, a novel multi-modal dataset constructed to benchmark performance under rapid motion. Extensive experiments demonstrate that PoseStreamer not only achieves superior accuracy in high-speed moving scenarios, but also exhibits strong generalizability as a template-free framework for unseen moving objects.
翻译:针对未见物体的六自由度(6DoF)姿态估计是计算机视觉中的关键任务,但在高速和低光场景下面临重大挑战,因为标准RGB相机易受运动模糊影响。事件相机凭借其高时间分辨率为此提供了有前景的解决方案,然而现有6DoF姿态估计方法在高速物体运动场景中通常表现欠佳。为弥补这一不足,我们提出了PoseStreamer——一个专为高速运动场景设计的鲁棒多模态6DoF姿态估计框架。该方法整合了三个核心组件:利用历史朝向线索保证时间一致性的自适应姿态记忆队列、通过提供强二维先验以提升三维中心召回率的以物体为中心的二维跟踪器,以及沿相机射线进行几何优化的射线姿态滤波器。此外,我们构建了新颖的多模态数据集MoCapCube6D,用于评估快速运动下的性能表现。大量实验表明,PoseStreamer不仅在高速运动场景中实现了卓越的精度,同时作为面向未见运动物体的无模板框架展现出强大的泛化能力。