In this paper, we propose an anchor-free single-stage LiDAR-based 3D object detector -- RangeDet. The most notable difference with previous works is that our method is purely based on the range view representation. Compared with the commonly used voxelized or Bird's Eye View (BEV) representations, the range view representation is more compact and without quantization error. Although there are works adopting it for semantic segmentation, its performance in object detection is largely behind voxelized or BEV counterparts. We first analyze the existing range-view-based methods and find two issues overlooked by previous works: 1) the scale variation between nearby and far away objects; 2) the inconsistency between the 2D range image coordinates used in feature extraction and the 3D Cartesian coordinates used in output. Then we deliberately design three components to address these issues in our RangeDet. We test our RangeDet in the large-scale Waymo Open Dataset (WOD). Our best model achieves 72.9/75.9/65.8 3D AP on vehicle/pedestrian/cyclist. These results outperform other range-view-based methods by a large margin (~20 3D AP in vehicle detection), and are overall comparable with the state-of-the-art multi-view-based methods. Codes will be public.
翻译:在本文中,我们建议使用一个无锚的单级LIDAR 基基于 LIDAR 的 3D 对象探测器 -- -- RangDt 。 与先前的作品相比, 最显著的区别在于我们的方法纯粹基于范围视图的表达方式。 与常用的 voxelized 或 Bird 眼视图( BEV) 的表达方式相比, 范围视图表达方式比通常使用的 Voxelelized 或 BDEV 显示方式更为紧凑, 没有量化错误。 虽然在语义分割方面有些工作采用它, 但它在物体探测过程中的性能大部分是在Voxelized 或 BEV 对应方。 我们首先分析现有的以范围为基础的方法, 并发现之前的工作忽略了两个问题:(1) 附近和远方天体之间的天体差异;(2) 特征提取中使用的 2D 范围图像坐标坐标坐标与产出中使用的 3D Cartesian 坐标之间的不一致。 然后我们刻意设计了三个组成部分来解决这些问题。 我们用大型Wemomo 开放数据集测试我们的距离。 我们的最佳模型在车辆/ 3D 3D 以大规模探测方法进行比较。