Real-time and high-performance 3D object detection is of critical importance for autonomous driving. Recent top-performing 3D object detectors mainly rely on point-based or 3D voxel-based convolutions, which are both computationally inefficient for onboard deployment. In contrast, pillar-based methods use merely 2D convolutions, which consume less computation resources, but they lag far behind their voxel-based counterparts in detection accuracy. In this paper, by examining the primary performance gap between pillar- and voxel-based detectors, we develop a real-time and high-performance pillar-based detector, dubbed PillarNet. The proposed PillarNet consists of a powerful encoder network for effective pillar feature learning, a neck network for spatial-semantic feature fusion and the commonly used detect head. Using only 2D convolutions, PillarNet is flexible to an optional pillar size and compatible with classical 2D CNN backbones, such as VGGNet and ResNet. Additionally, PillarNet benefits from an orientation-decoupled IoU regression loss along with the IoU-aware prediction branch. Extensive experimental results on the large-scale nuScenes Dataset and Waymo Open Dataset demonstrate that the proposed PillarNet performs well over the state-of-the-art 3D detectors in terms of effectiveness and efficiency.
翻译:对自动驾驶而言,最新高性能的3D天体探测器主要依赖点基或3D天体变异,这些变异在计算上效率低,而基于支柱的方法仅使用2D天体变异,这些变异消耗的计算资源较少,但在检测准确性方面远远落后于基于 voxel 的对等方。在本文件中,通过审查柱和基于 voxel 的探测器之间的主要性能差距,我们开发了一个基于立方体的实时和高性能立方体探测器,称为支柱网。拟议的支柱网包括一个强大的编码网络,用于有效的界碑特征学习、空间系特征融合的颈网和常用的探测头。仅使用2D天体变异器,支柱网具有灵活性,可选择的界碑大小远远低于传统的 2D WNCN 脊柱,如VGGNet和ResNet。此外,支柱网与IOU-awa 预测处一道,通过定向脱钩化的IOU-WA预测处的ICONet回归损失而获益。