In this paper, we propose a novel voxel-based 3D single object tracking (3D SOT) method called Voxel Pseudo Image Tracking (VPIT). VPIT is the first method that uses voxel pseudo images for 3D SOT. The input point cloud is structured by pillar-based voxelization, and the resulting pseudo image is used as an input to a 2D-like Siamese SOT method. The pseudo image is created in the Bird's-eye View (BEV) coordinates, and therefore the objects in it have constant size. Thus, only the object rotation can change in the new coordinate system and not the object scale. For this reason, we replace multi-scale search with a multi-rotation search, where differently rotated search regions are compared against a single target representation to predict both position and rotation of the object. Experiments on KITTI Tracking dataset show that VPIT is the fastest 3D SOT method and maintains competitive Success and Precision values. Application of a SOT method in a real-world scenario meets with limitations such as lower computational capabilities of embedded devices and a latency-unforgiving environment, where the method is forced to skip certain data frames if the inference speed is not high enough. We implement a real-time evaluation protocol and show that other methods lose most of their performance on embedded devices, while VPIT maintains its ability to track the object.
翻译:在本文中, 我们提出一种新颖的基于 voxel 的 3D 的 3D 单一对象跟踪( 3D SOT) 方法, 名为 Voxel Pseudo 图像跟踪( VPIT ) 。 VPIT 是使用三D SOT 伪化图像的第一个方法。 输入点云由基于柱的 voxel 合成云组成, 由此产生的伪图像被用作 2D 类似 Siamseese SOT 方法的输入。 假图像是在 Bird- 眼视图( BEV) 坐标中创建的, 因而其对象具有恒定的大小。 因此, 只有对象的旋转才能改变新的协调系统, 而不是天体尺度。 为此, 我们用多旋转搜索方法取代了多尺度搜索。 在多旋转的搜索区域中, 以一个单一的目标表示来预测对象的位置和旋转。 对 KITTI 跟踪数据集的实验显示, VPIT 是最快的 3D SOT 方法, 并保持竞争成功和精度值 。 因此, 在现实的物体假设中应用方法会遇到限制,, 低的计算能力 而不是高精确的操作, 。