Accurate 3D object detection in LiDAR based point clouds suffers from the challenges of data sparsity and irregularities. Existing methods strive to organize the points regularly, e.g. voxelize, pass them through a designed 2D/3D neural network, and then define object-level anchors that predict offsets of 3D bounding boxes using collective evidences from all the points on the objects of interest. Contrary to the state-of-the-art anchor-based methods, based on the very nature of data sparsity, we observe that even points on an individual object part are informative about semantic information of the object. We thus argue in this paper for an approach opposite to existing methods using object-level anchors. Inspired by compositional models, which represent an object as parts and their spatial relations, we propose to represent an object as composition of its interior non-empty voxels, termed hotspots, and the spatial relations of hotspots. This gives rise to the representation of Object as Hotspots (OHS). Based on OHS, we further propose an anchor-free detection head with a novel ground truth assignment strategy that deals with inter-object point-sparsity imbalance to prevent the network from biasing towards objects with more points. Experimental results show that our proposed method works remarkably well on objects with a small number of points. Notably, our approach ranked 1st on KITTI 3D Detection Benchmark for cyclist and pedestrian detection, and achieved state-of-the-art performance on NuScenes 3D Detection Benchmark.
翻译:以LiDAR为基础的点云中精确的 3D 对象检测存在数据宽度和不规则现象的挑战。 现有的方法努力定期组织点, 例如 voxelize, 通过设计好的 2D/3D 神经网络, 并通过设计好的 2D/3D 神经网络传递这些点, 然后定义目标级锚, 利用利益对象上所有点上的所有点上的集体证据, 来预测3D 捆绑框的抵消。 与基于数据宽度本身的基于最先进的锚基点方法相反, 我们观察到, 单个目标部分的点也会对目标的语义信息有所了解。 因此, 我们在本文件中主张采用与现有方法相反的方法, 使用目标级的锚。 受组成模型的启发, 它代表一个物体作为部分及其空间关系的3D 。 我们提议将一个物体作为内部非空点、 热点和热点空间关系的构成。 这增加了目标作为热点( OHS) 的标点的表示方式。 基于 OHS, 我们进一步提议, 将一个不固定的检测点头与现有方法相反的路径定位, 直径端点 显示我们的直径 直径 直径 战略 。