3D object detection from LiDAR data for autonomous driving has been making remarkable strides in recent years. Among the state-of-the-art methodologies, encoding point clouds into a bird's-eye view (BEV) has been demonstrated to be both effective and efficient. Different from perspective views, BEV preserves rich spatial and distance information between objects; and while farther objects of the same type do not appear smaller in the BEV, they contain sparser point cloud features. This fact weakens BEV feature extraction using shared-weight convolutional neural networks. In order to address this challenge, we propose Range-Aware Attention Network (RAANet), which extracts more powerful BEV features and generates superior 3D object detections. The range-aware attention (RAA) convolutions significantly improve feature extraction for near as well as far objects. Moreover, we propose a novel auxiliary loss for density estimation to further enhance the detection accuracy of RAANet for occluded objects. It is worth to note that our proposed RAA convolution is lightweight and compatible to be integrated into any CNN architecture used for the BEV detection. Extensive experiments on the nuScenes dataset demonstrate that our proposed approach outperforms the state-of-the-art methods for LiDAR-based 3D object detection, with real-time inference speed of 16 Hz for the full version and 22 Hz for the lite version. The code is publicly available at an anonymous Github repository https://github.com/anonymous0522/RAAN.
翻译:从LIDAR数据中检测自动驱动的3D物体近年来取得了显著进步。 在最新的方法中,将点云编码成鸟眼视图(BEV)已证明既有效又高效。不同的观点是,BEV保存了不同对象之间丰富的空间和距离信息;虽然在BEV中,同类物体的密度估计似乎并不小,但含有较稀疏的点云特征。这一事实削弱了使用共同重量级的脉冲神经网络对BEV特征的提取。为了应对这一挑战,我们提议了RAAware注意网络(RAANet),它提取了更强大的BEV功能并生成了更高级的3D对象探测功能。从视野角度看,BEV(RAA)的注意力(RAA)变化大大改进了相近物体之间的空间和距离信息信息;此外,我们提议对密度估计进行新的附带损失,以进一步提高RAANet对隐蔽物体的探测精度。值得注意的是,我们提议的RAAA Convology Revention (RAANet) 网络的精度和任何用于BEVD-D探测的CNN的CNN结构结构结构。 22的探测方法显示了我们用于16ARED-D的高级Hnual-de-deal-de-deal-deal 3-ex-deal 方法,这是用于用于用于16个Hnu-de-de-de-de-de-de-de-de-de-de-de-de-de-de-deal-deal-deal-de-de-de-deal-deal 的22的搜索式的精确的方法,这是一种用于用于用于的精确方法。