3D object detection using point clouds has attracted increasing attention due to its wide applications in autonomous driving and robotics. However, most existing studies focus on single point cloud frames without harnessing the temporal information in point cloud sequences. In this paper, we design TransPillars, a novel transformer-based feature aggregation technique that exploits temporal features of consecutive point cloud frames for multi-frame 3D object detection. TransPillars aggregates spatial-temporal point cloud features from two perspectives. First, it fuses voxel-level features directly from multi-frame feature maps instead of pooled instance features to preserve instance details with contextual information that are essential to accurate object localization. Second, it introduces a hierarchical coarse-to-fine strategy to fuse multi-scale features progressively to effectively capture the motion of moving objects and guide the aggregation of fine features. Besides, a variant of deformable transformer is introduced to improve the effectiveness of cross-frame feature matching. Extensive experiments show that our proposed TransPillars achieves state-of-art performance as compared to existing multi-frame detection approaches. Code will be released.
翻译:利用点云进行三维天体探测,因其在自主驱动和机器人中的广泛应用而引起越来越多的注意。然而,大多数现有研究侧重于单点云框架,而没有利用点云序列中的时间信息。在本文件中,我们设计了基于变压器的新型特效聚合技术,即利用连续点云框架的时间特征进行多框架三维天体探测。跨极星从两个角度综合了空间时点云的特征。首先,它直接从多框架特征图中,而不是集合实例特征中,结合了 voxel 水平的特征,以保存对精确天体定位至关重要的背景信息。其次,我们引入了一种分级粗到线战略,将多尺度的特性引信化,以有效捕捉移动物体的动动,并指导细微特征的聚合。此外,还引入了变形变形变异器,以提高跨框架特征匹配的有效性。广泛的实验显示,我们提议的 TransPiller公司与现有的多框架探测方法相比,取得了最先进的性能。代码将被释放。