Recent advances in 3D object detection is made by developing the refinement stage for voxel-based Region Proposal Networks (RPN) to better strike the balance between accuracy and efficiency. A popular approach among state-of-the-art frameworks is to divide proposals, or Regions of Interest (ROI), into grids and extract feature for each grid location before synthesizing them to form ROI feature. While achieving impressive performances, such an approach involves a number of hand crafted components (e.g. grid sampling, set abstraction) which requires expert knowledge to be tuned correctly. This paper proposes a data-driven approach to ROI feature computing named APRO3D-Net which consists of a voxel-based RPN and a refinement stage made of Vector Attention. Unlike the original multi-head attention, Vector Attention assigns different weights to different channels within a point feature, thus being able to capture a more sophisticated relation between pooled points and ROI. Experiments on KITTI \textit{validation} set show that our method achieves competitive performance of 84.84 AP for class Car at Moderate difficulty while having the least parameters compared to closely related methods and attaining a quasi-real time inference speed at 15 FPS on NVIDIA V100 GPU. The code is released in https://github.com/quan-dao/APRO3D-Net.
翻译:3D物体探测的最近进展是,为更好地平衡准确性和效率,开发了基于Voxel的区域建议网络(RPN)的完善阶段,以更好地平衡准确性和效率。在最先进的框架中,流行的方法是将提案或利益区域(ROI)分割成网格和每个网格位置的提取特征,然后将其合成成ROI特征。虽然取得令人印象深刻的性能,但这种方法涉及手工艺的一些部件(例如网格取样、设置抽象性能),这需要专家知识的正确调整。本文建议对ROI特性计算采用数据驱动法,名为 APRO3D-Net,由基于Voxel的RPN和矢量注意的精细化阶段组成。与最初的多头关注不同,VCentor 注意力对某个点特性的不同渠道给予不同的权重,从而能够捕捉到集合点与ROI的更复杂的关系。KITTI\ texti/Netvalidation}实验表明,我们的方法在MDA类汽车上实现了8484-84 AP的竞争性性性性性工作,同时,在MDIS-VA在最低速度上也很难在15/FPISPI/VSA上取得最接近的进展规则。