Safe autonomous driving technology heavily depends on accurate 3D object detection since it produces input to safety critical downstream tasks such as prediction and navigation. Recent advances in this field is made by developing the refinement stage for voxel-based region proposal networks to better strike the balance between accuracy and efficiency. A popular approach among state-of-the-art frameworks is to divide proposals, or Region of Interest (ROI), into grids and extract feature for each grid location before synthesizing them to ROI feature. While achieving impressive performances, such an approach involves a number of hand crafted components (e.g. grid sampling, set abstraction) which requires expert knowledge to be tuned correctly. This paper takes a more data-driven approach to ROI feature extraction using the attention mechanism. Specifically, points inside a ROI are positionally encoded to incorporate ROI 's geometry. The resulted position encoding and their features are transformed into ROI feature via vector attention. Unlike the original multi-head attention, vector attention assign different weights to different channels within a point feature, thus being able to capture a more sophisticated relation between pooled points and ROI. Experiments on KITTI \textit{validation} set show that our method achieves competitive performance of 84.84 AP for class Car at Moderate difficulty while having the least parameters compared to closely related methods and attaining a quasi-real time inference speed at 15 FPS on NVIDIA V100 GPU. The code will be released.
翻译:安全自主驾驶技术在很大程度上取决于准确的三维天体探测,因为它为预测和导航等关键的下游安全任务提供了投入。这一领域最近的进展是通过发展基于 voxel 的区域建议网络的完善阶段,以更好地平衡准确性和效率之间的平衡。最先进的框架中流行的方法是将建议或利益区域(ROI)分为网格和每个网格位置的提取功能,然后将其与ROI特征合成。这种方法虽然取得了令人印象深刻的性能,但涉及手动制作的一些部件(例如,电网取样、设定抽象),这需要正确调整专业知识。本文对使用关注机制的ROI特征提取采用了更注重数据的方法。具体地说,最先进的网络框架框架内各点已定位为纳入ROI的地理测量。由此产生的位置编码及其特征通过矢量关注转化为ROI特征。与最初的多头关注不同,病媒注意力在点特性下对不同的渠道分配不同重量,因此能够捕捉到集合点点点点与ROI之间更为复杂的关系。在最小的集合点点点和ROI 特征上,对ROI 特点采用了更注重的数据驱动方法。在比较的TRI 184 的进度方法上将显示我们最慢的运行方法的进度方法,同时进行。