基于位置引导的点云全景分割Transformer (Position-Guided Point Cloud Panoptic Segmentation Transformer)

DEtection TRansformer (DETR) started a trend that uses a group of learnable queries for unified visual perception. This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline. Although the naive adaptation obtains fair results, the instance segmentation performance is noticeably inferior to previous works. By diving into the details, we observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain. Considering instances in 3D are more featured by their positional information, we emphasize their roles during the modeling and design a robust Mixed-parameterized Positional Embedding (MPE) to guide the segmentation process. It is embedded into backbone features and later guides the mask prediction and query update processes iteratively, leading to Position-Aware Segmentation (PA-Seg) and Masked Focal Attention (MFA). All these designs impel the queries to attend to specific regions and identify various instances. The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 3.4% and 1.2% PQ on SemanticKITTI and nuScenes benchmark, respectively. The source code and models are available at https://github.com/SmartBot-PJLab/P3Former .

翻译：DEtection TRansformer (DETR)开启了一种使用可学习查询的统一视觉感知的趋势。本研究首先将这种吸引人的范例应用于基于LiDAR的点云分割，并获得了一个简单而有效的基准线。尽管这种基准线获得了不错的结果，但实例分割性能明显不及以前的方法。深入研究后，我们发现，稀疏点云中的实例相对于整个场景来说相当小，通常具有相似的几何形状，但缺乏区分性的外观，这在图像域中很少见。考虑到三维中的实例更受它们的位置信息的特征化，我们强调了它们在建模过程中的作用，并设计了一个强大的混合参数化位置嵌入（MPE）来引导分割过程。它被嵌入到骨干网络特征中，后来通过迭代地指导掩膜预测和查询更新过程，导致了Position-Aware Segmentation (PA-Seg)和Masked Focal Attention (MFA)。所有这些设计都促使查询关注特定区域并识别各种实例。该方法命名为Position-guided Point cloud Panoptic segmentation transFormer (P3Former)，在SemanticKITTI和nuScenes基准测试中超过以前的最先进方法，分别为3.4％和1.2％的PQ。该源代码和模型可在https://github.com/SmartBot-PJLab/P3Former上获得。

相关内容

点云

关注 48

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【CVPR2022】端到端实时矢量边缘提取（E2EC）

专知会员服务

16+阅读 · 2022年4月14日

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CVPR2022】弱监督语义分割的类重新激活图

专知会员服务

17+阅读 · 2022年3月7日