With the prevalence of LiDAR sensors in autonomous driving, 3D object tracking has received increasing attention. In a point cloud sequence, 3D object tracking aims to predict the location and orientation of an object in consecutive frames given an object template. Motivated by the success of transformers, we propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results in a coarse-to-fine manner with the help of transformer operations. PTTR consists of three novel designs. 1) Instead of random sampling, we design Relation-Aware Sampling to preserve relevant points to the given template during subsampling. 2) We propose a Point Relation Transformer for effective feature aggregation and feature matching between the template and search region. 3) Based on the coarse tracking results, we employ a novel Prediction Refinement Module to obtain the final refined prediction through local feature pooling. In addition, motivated by the favorable properties of the Bird's-Eye View (BEV) of point clouds in capturing object motion, we further design a more advanced framework named PTTR++, which incorporates both the point-wise view and BEV representation to exploit their complementary effect in generating high-quality tracking results. PTTR++ substantially boosts the tracking performance on top of PTTR with low computational overhead. Extensive experiments over multiple datasets show that our proposed approaches achieve superior 3D tracking accuracy and efficiency.
翻译:随着自动驱动的LIDAR传感器的流行,3D对象跟踪受到越来越多的关注。在一个点云序列中,3D对象跟踪旨在预测一个对象的位置和方向,根据一个对象模板,预测一个对象在连续框中的位置和方向。由于变压器的成功,我们提议了点跟踪TRansout(PTTR),在变压器操作的帮助下,以粗略到松散的方式有效地预测高质量的3D跟踪结果。PTTR由三种新设计组成。1)我们设计了Relation-Aware取样,而不是随机抽样,目的是在子取样期间保存给特定模板的相关点。2)我们提议了一个点关系转换器,用于有效的特征组合以及模板和搜索区域之间的特征匹配。3)基于粗略的跟踪结果,我们采用了一个新的预测精度精度改进模块,以便通过本地特性集合来获得最终的精度预测。此外,由于Bird'Eye View(BEV)在捕捉取物体动作时的有利性能,我们进一步设计了一个更先进的框架,名为PTR+B,它既包括了高端TR的精度跟踪结果,又大大地展示了我们高端的升级的升级的轨道。