The point cloud based 3D single object tracking has drawn increasing attention. Although many breakthroughs have been achieved, we also reveal two severe issues. By extensive analysis, we find the prediction manner of current approaches is non-robust, i.e., exposing a misalignment gap between prediction score and actually localization accuracy. Another issue is the sparse point returns will damage the feature matching procedure of the SOT task. Based on these insights, we introduce two novel modules, i.e., Adaptive Refine Prediction (ARP) and Target Knowledge Transfer (TKT), to tackle them, respectively. To this end, we first design a strong pipeline to extract discriminative features and conduct the matching with the attention mechanism. Then, ARP module is proposed to tackle the misalignment issue by aggregating all predicted candidates with valuable clues. Finally, TKT module is designed to effectively overcome incomplete point cloud due to sparse and occlusion issues. We call our overall framework PCET. By conducting extensive experiments on the KITTI and Waymo Open Dataset, our model achieves state-of-the-art performance while maintaining a lower computational cost.
翻译:尽管取得了许多突破,但我们也发现了两个严重问题。通过广泛的分析,我们发现目前方法的预测方式是非沸流,即暴露预测分数和实际本地化准确性之间的不匹配差距。另一个问题是,少点回报会损害SOT任务的特征匹配程序。根据这些洞察力,我们引入了两个新颖模块,即适应性Refine 预测(ARP)和目标知识传输(TKT),分别解决这些问题。为此,我们首先设计了一个强有力的管道,以提取歧视性特征,并进行与关注机制的匹配。然后,建议ARP模块通过用宝贵的线索汇集所有预测候选人来解决不匹配问题。最后,TKT模块旨在有效克服由于稀疏和封闭问题而造成的不完整点云。我们称之为我们的总体框架PCET。通过对KITTI和Waymo Open数据集进行广泛的实验,我们的模型在保持较低的计算成本的同时实现了状态性业绩。