BEV-SAN: 通过切片注意网络准确检测BEV 3D物体 (BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks)

Bird's-Eye-View (BEV) 3D Object Detection is a crucial multi-view technique for autonomous driving systems. Recently, plenty of works are proposed, following a similar paradigm consisting of three essential components, i.e., camera feature extraction, BEV feature construction, and task heads. Among the three components, BEV feature construction is BEV-specific compared with 2D tasks. Existing methods aggregate the multi-view camera features to the flattened grid in order to construct the BEV feature. However, flattening the BEV space along the height dimension fails to emphasize the informative features of different heights. For example, the barrier is located at a low height while the truck is located at a high height. In this paper, we propose a novel method named BEV Slice Attention Network (BEV-SAN) for exploiting the intrinsic characteristics of different heights. Instead of flattening the BEV space, we first sample along the height dimension to build the global and local BEV slices. Then, the features of BEV slices are aggregated from the camera features and merged by the attention mechanism. Finally, we fuse the merged local and global BEV features by a transformer to generate the final feature map for task heads. The purpose of local BEV slices is to emphasize informative heights. In order to find them, we further propose a LiDAR-guided sampling strategy to leverage the statistical distribution of LiDAR to determine the heights of local slices. Compared with uniform sampling, LiDAR-guided sampling can determine more informative heights. We conduct detailed experiments to demonstrate the effectiveness of BEV-SAN. Code will be released.

翻译：鸟类- Eye- View (BEV) 3D 对象探测是自主驾驶系统至关重要的多视图技术。最近,根据一个由三个基本部件组成的类似范例,即摄像特征提取、 BEV 特征构造和任务头,提出了大量工程。在三个部件中, BEV 特征的构造是BEV 特性与 2D 任务相比是特定BEV 特性的。现有方法将多视图相机特性与平坦的电网相合并, 以构建 BEV 特性。但是, 沿着高度维度平整开的BEV 空间。但是, 平整BEV 切片的特性未能强调不同高度的高度, 而由摄制的摄像特征, 并用最后的注意机制合并。最后, 我们将本地和LIEV 值的浓度比值转换为B 特性, 我们把本地和LEV 的比值比值比值比值比值比值比值比值比值比值比值的比值比值比值比值比值比值比值的比值比值比值比值比值比值比值比值比值比值比值比值比值比值的比值比值比值比值比值比值比值比值比值比值的比值比值比值的比值比值比值比值比值比值的比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值的比值比值比值比值比值的比值的比值比值的比值比值比值比值的比值比值比值的比值比值比值比值比值比值比值比值比值比值的比值比值比值比值比值比值的比值比值比值比值比值比值比值比值比值比值比值比值比值的比值比值比值比值比值的比值的比值比值更高更高更高到比值的比值比值比值比值比值比值的比值的比值的比值的比值比值的比值的比值更