Two major challenges of 3D LiDAR Panoptic Segmentation (PS) are that point clouds of an object are surface-aggregated and thus hard to model the long-range dependency especially for large instances, and that objects are too close to separate each other. Recent literature addresses these problems by time-consuming grouping processes such as dual-clustering, mean-shift offsets, etc., or by bird-eye-view (BEV) dense centroid representation that downplays geometry. However, the long-range geometry relationship has not been sufficiently modeled by local feature learning from the above methods. To this end, we present SCAN, a novel sparse cross-scale attention network to first align multi-scale sparse features with global voxel-encoded attention to capture the long-range relationship of instance context, which can boost the regression accuracy of the over-segmented large objects. For the surface-aggregated points, SCAN adopts a novel sparse class-agnostic representation of instance centroids, which can not only maintain the sparsity of aligned features to solve the under-segmentation on small objects, but also reduce the computation amount of the network through sparse convolution. Our method outperforms previous methods by a large margin in the SemanticKITTI dataset for the challenging 3D PS task, achieving 1st place with a real-time inference speed.
翻译:3D LiDAR 泛光截面( PS) 的两大挑战是,一个物体的点云是表面集合的,因此很难模拟长距离依赖性,特别是大情况下的长距离依赖性,而且物体太接近于彼此分离。最近的一些文献通过诸如双层集群、中位变换等耗时的组合过程,或由下方几何的鸟类-眼观(BEV)稠密的中间体代表面来处理这些问题。然而,一个物体的点云是表面的,因此很难模拟长距离的几何关系,而从上述方法中学习的本地特征并没有足够建模。为此,我们展示了SCAN,这是一个新颖的分散的跨规模关注网络,首次将多尺度的稀有特征与全球 voxel 编码的注意相匹配,以捕捉长距离的组合关系,例如双层组合、中位偏移等,或由鸟类-眼(BEVEVEV) 密集的中位表示。对于地组点,SCAN采用了一种稀奇的低级分类- 等的中位式中中中位数表示,它不仅能够维持对小网络内物体进行内测的精确的特性,而且通过前置的变变变式计算,而且还会减少了我们式的变式的轨图。