In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Overlooking this difference, many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds. In this paper, we start by rethinking how such multi-stride stereotype affects the LiDAR-based 3D object detectors. Our experiments point out that the downsampling operations bring few advantages, and lead to inevitable information loss. To remedy this issue, we propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network. Armed with transformers, our method addresses the problem of insufficient receptive field in single-stride architectures. It also cooperates well with the sparsity of point clouds and naturally avoids expensive computation. Eventually, our SST achieves state-of-the-art results on the large scale Waymo Open Dataset. It is worth mentioning that our method can achieve exciting performance (83.8 LEVEL 1 AP on validation split) on small object (pedestrian) detection due to the characteristic of single stride. Codes will be released at https://github.com/TuSimple/SST
翻译:在基于 LiDAR 的三维天体自动驱动探测中,对象大小与输入场景大小的比例比比 2D 探测案例要小得多。 忽略这一差异, 许多 3D 探测器直接遵循了 2D 探测器的常见做法, 即使在对点云进行量化之后, 该探测器也缩小了地貌地图的样本。 在本文中, 我们首先重新思考这种多维刻度的刻板印象如何影响基于 3D 的立体探测器。 我们的实验指出, 降底取样操作没有带来多少优势, 导致不可避免的信息丢失。 为了纠正这一问题, 我们提议单丝带 斯巴特变异变异器( SST) 从网络的开头到结尾保持原始分辨率。 配有变异器, 我们的方法可以解决单面结构中无法充分接受的场的问题。 它还与点云的偏僻和自然避免昂贵的计算结果。 最后, 我们的SST 在大型Waymo Opend Dataset 上, 我们的方法可以实现令人兴奋的性表现( 83.8 AS AP 1 AS AP 1) 在验证系统/ ablistalblement Staproad exal exal excial) 。