Recently, Transformer has achieved great success in computer vision. However, it is constrained because the spatial and temporal complexity grows quadratically with the number of large points in 3D object detection applications. Previous point-wise methods are suffering from time consumption and limited receptive fields to capture information among points. In this paper, we propose a two-stage hyperbolic cosine transformer (ChTR3D) for 3D object detection from LiDAR point clouds. The proposed ChTR3D refines proposals by applying cosh-attention in linear computation complexity to encode rich contextual relationships among points. The cosh-attention module reduces the space and time complexity of the attention operation. The traditional softmax operation is replaced by non-negative ReLU activation and hyperbolic-cosine-based operator with re-weighting mechanism. Extensive experiments on the widely used KITTI dataset demonstrate that, compared with vanilla attention, the cosh-attention significantly improves the inference speed with competitive performance. Experiment results show that, among two-stage state-of-the-art methods using point-level features, the proposed ChTR3D is the fastest one.
翻译:最近,变异器在计算机视野方面取得了巨大成功,然而,由于空间和时间复杂性随着3D对象探测应用中的大点数而增加,因此受到制约。以前的点法方法因时间消耗而受到影响,而且用于收集各点之间信息的可接受字段有限。在本文件中,我们提议用双阶段双曲余弦调变压器(ChTR3D)从LiDAR点云中探测3D对象。拟议的CCHTR3D改进了提案,在线性计算复杂性中应用共注意来编码各点之间丰富的背景关系。注意模块减少了注意操作的空间和时间复杂性。传统的软模件操作被非负RELU激活和带有再加权机制的超双曲子基操作器所取代。对广泛使用的KITTI数据集的广泛实验表明,与香草关注相比,共读率极大地提高了竞争性性能的推断速度。实验结果表明,在使用点级特征的两阶段状态方法中,拟议的CTR3是最快的。