Real-time and high-performance 3D object detection is of critical importance for autonomous driving. Recent top-performing 3D object detectors mainly rely on point-based or 3D voxel-based convolutions, which are both computationally inefficient for onboard deployment. In contrast, pillar-based methods use solely 2D convolutions, which consume less computation resources, but they lag far behind their voxel-based counterparts in detection accuracy. In this paper, by examining the primary performance gap between pillar- and voxel-based detectors, we develop a real-time and high-performance pillar-based detector, dubbed PillarNet. The proposed PillarNet consists of a powerful encoder network for effective pillar feature learning, a neck network for spatial-semantic feature fusion and the commonly used detect head. Using only 2D convolutions, PillarNet is flexible to an optional pillar size and compatible with classical 2D CNN backbones, such as VGGNet and ResNet.Additionally, PillarNet benefits from our designed orientation-decoupled IoU regression loss along with the IoU-aware prediction branch. Extensive experimental results on large-scale nuScenes Dataset and Waymo Open Dataset demonstrate that the proposed PillarNet performs well over the state-of-the-art 3D detectors in terms of effectiveness and efficiency. The source code is available at https://github.com/agent-sgs/PillarNet.git.
翻译:3D 目标的实时和高性能探测对于自主驱动至关重要。 最近的高性能 3D 目标探测器主要依靠点基或3D voxel 组合,这些组合在计算上对机上部署来说效率低。 相比之下, 以支柱为基础的方法只使用 2D 组合,它们消耗的计算资源较少,但在检测准确性方面却远远落后于基于 voxel 和 voxel 的对等方。 在本文件中,我们通过审查支柱和基于 voxel 的探测器之间的主要性能差距,开发了一个以高性能为主的立柱探测器。 拟议的支柱网由强大的编码网络网络组成,用于有效的支柱特征学习、空间- Semantical 特性融合的颈项网络和常用的探测头。 仅使用 2D 曲线, 支柱网具有灵活性,与传统的 2D WNCN 脊柱(如VGGNet 和ResNet. ) 等典型的2GGNet 和 Rescomfredition, 我们设计的定向- decoupupupupul IMU) IO- DU 和D- dal- Provieward 预测部门的现有3S