3D object detection received increasing attention in autonomous driving recently. Objects in 3D scenes are distributed with diverse orientations. Ordinary detectors do not explicitly model the variations of rotation and reflection transformations. Consequently, large networks and extensive data augmentation are required for robust detection. Recent equivariant networks explicitly model the transformation variations by applying shared networks on multiple transformed point clouds, showing great potential in object geometry modeling. However, it is difficult to apply such networks to 3D object detection in autonomous driving due to its large computation cost and slow reasoning speed. In this work, we present TED, an efficient Transformation-Equivariant 3D Detector to overcome the computation cost and speed issues. TED first applies a sparse convolution backbone to extract multi-channel transformation-equivariant voxel features; and then aligns and aggregates these equivariant features into lightweight and compact representations for high-performance 3D object detection. On the highly competitive KITTI 3D car detection leaderboard, TED ranked 1st among all submissions with competitive efficiency.
翻译:3D 对象探测在最近被自动驾驶中日益引起注意。 3D 场景中的物体分布有不同方向。 普通探测器没有明确地模拟旋转和反射变异。 因此, 需要大型网络和广泛的数据增强才能进行强力探测。 最近的等同网络通过在多变点云上应用共享网络来明确模拟变异,显示在物体几何建模型方面的巨大潜力。 但是,由于高计算成本和慢推理速度,很难在自动驾驶中应用这种网络来进行3D 物体探测。 在这项工作中,我们介绍TED, 一个高效的变异3D 探测器, 以克服计算成本和速度问题。 TED 首先应用稀有的变异骨来提取多频道变异- quivariant voxel 特性; 然后将这些变异性特性组合为光量和缩放式, 用于高性能3D 3D 对象探测。 在高竞争性的 KITTI 3D 汽车探测头板上,TED 在所有提交文件中名列第一。