The point cloud based 3D single object tracking (3DSOT) has drawn increasing attention. Lots of breakthroughs have been made, but we also reveal two severe issues. By an extensive analysis, we find the prediction manner of current approaches is non-robust, i.e., exposing a misalignment gap between prediction score and actually localization accuracy. Another issue is the sparse point returns will damage the feature matching procedure of the SOT task. Based on these insights, we introduce two novel modules, i.e., Adaptive Refine Prediction (ARP) and Target Knowledge Transfer (TKT), to tackle them, respectively. To this end, we first design a strong pipeline to extract discriminative features and conduct the matching procedure with the attention mechanism. Then, ARP module is proposed to tackle the misalignment issue by aggregating all predicted candidates with valuable clues. Finally, TKT module is designed to effectively overcome incomplete point cloud due to sparse and occlusion issues. We call our overall framework PCET. By conducting extensive experiments on the KITTI and Waymo Open Dataset, our model achieves state-of-the-art performance while maintaining a lower computational consumption.
翻译:基于 3D 的 3D 单一对象跟踪 (3DSOT) 的点云引起了越来越多的注意。 已经取得了许多突破, 但我们也揭示了两个严重问题。 通过广泛的分析, 我们发现当前方法的预测方式是非紫外线的, 也就是说, 显示预测得分和实际本地化准确性之间的不匹配差距。 另一个问题是, 少点返回会损害 SPO 任务的特点匹配程序。 基于这些洞察, 我们引入了两个新颖模块, 即适应性 Refine 预测(ARP) 和目标知识传输( TKT), 分别解决这些问题。 为此, 我们首先设计一个强大的管道, 提取歧视性特征, 并进行与关注机制匹配的程序。 然后, ARP 模块将所有预测的候选人集中在一起, 从而解决不匹配的问题 。 最后, TKTT 模块旨在有效克服由于稀疏和隔离问题造成的不完整的点云。 我们称之为总体框架 PCET。 通过对 KITTI 和 Waymo Open Data set 进行广泛的实验, 我们的模型在保持低度的消费中实现状态的计算中。