Transformers as versatile network architectures have recently seen great success in 3D point cloud object detection. However, the lack of hierarchy in a plain transformer makes it difficult to learn features at different scales and restrains its ability to extract localized features. Such limitation makes them have imbalanced performance on objects of different sizes, with inferior performance on smaller ones. In this work, we propose two novel attention mechanisms as modularized hierarchical designs for transformer-based 3D detectors. To enable feature learning at different scales, we propose Simple Multi-Scale Attention that builds multi-scale tokens from a single-scale input feature. For localized feature aggregation, we propose Size-Adaptive Local Attention with adaptive attention ranges for every bounding box proposal. Both of our attention modules are model-agnostic network layers that can be plugged into existing point cloud transformers for end-to-end training. We evaluate our method on two widely used indoor 3D point cloud object detection benchmarks. By plugging our proposed modules into the state-of-the-art transformer-based 3D detector, we improve the previous best results on both benchmarks, with the largest improvement margin on small objects.
翻译:作为多功能网络结构的变异器最近在3D点云天探测中取得了巨大成功。 但是,平坦变异器缺乏等级,使得难以在不同尺度上学习特征并限制其提取本地特性的能力。 这种限制使得它们在不同大小物体上的性能不平衡,而较小尺寸的性能较差。 在这项工作中,我们提出两个新的关注机制,作为基于3D变异器的3D探测器的模块化等级设计。为了在不同尺度上进行特征学习,我们提议简单多层注意,从单一尺度输入特征中建立多尺度符号。对于局部特性汇总,我们提议对每个捆绑框提议采用适应性本地注意范围。我们的两个注意模块都是模型-敏感网络层,可以连接到现有点云变异端培训中。我们评估了两种广泛使用的室内3D点云天天探测基准的方法。我们将我们提议的模块插进基于单一尺度的变异器3D探测器的状态探测器,我们改进了两个基准上的最佳结果,小物体的改进幅度最大。