用于 3D 对象探测的 Voxel 变形器 (Voxel Transformer for 3D Object Detection)

We present Voxel Transformer (VoTr), a novel and effective voxel-based Transformer backbone for 3D object detection from point clouds. Conventional 3D convolutional backbones in voxel-based 3D detectors cannot efficiently capture large context information, which is crucial for object recognition and localization, owing to the limited receptive fields. In this paper, we resolve the problem by introducing a Transformer-based architecture that enables long-range relationships between voxels by self-attention. Given the fact that non-empty voxels are naturally sparse but numerous, directly applying standard Transformer on voxels is non-trivial. To this end, we propose the sparse voxel module and the submanifold voxel module, which can operate on the empty and non-empty voxel positions effectively. To further enlarge the attention range while maintaining comparable computational overhead to the convolutional counterparts, we propose two attention mechanisms for multi-head attention in those two modules: Local Attention and Dilated Attention, and we further propose Fast Voxel Query to accelerate the querying process in multi-head attention. VoTr contains a series of sparse and submanifold voxel modules and can be applied in most voxel-based detectors. Our proposed VoTr shows consistent improvement over the convolutional baselines while maintaining computational efficiency on the KITTI dataset and the Waymo Open dataset.

翻译：我们展示了Voxel 变形器(Voxel 变形器)(VoTr ), 这是一种新颖而有效的Voxel 变形器(VoTr ), 用于从点云中检测 3D 3D 探测器中的常规 3D 3D 变形主干网无法有效捕捉大背景信息, 这对于天体识别和本地化至关重要, 因为可接受字段有限。在本文件中, 我们通过引入一个基于变形器的架构来解决这个问题, 使 voxel 之间能够通过自我关注实现长距离关系。鉴于非空的 voxel 变形器自然稀疏, 但数量众多, 直接对 voxel 直接应用标准变形器是非三角的。为此, 我们提议了稀疏的 voxel 变形变形器模块和子折叠式 vurx 模块, 能够有效地在空的和无孔体变形变形体位置位置上运行。为了进一步扩大关注范围,同时保持与变形变形变形变形变形的计算模型中的拟议数据序列。 Vorttrax 将显示我们的变式变式变式的变式变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式, 。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日