FIDNet:利达雷达点云塞义片分解,完全内插解 (FIDNet: LiDAR Point Cloud Semantic Segmentation with Fully Interpolation Decoding)

Projecting the point cloud on the 2D spherical range image transforms the LiDAR semantic segmentation to a 2D segmentation task on the range image. However, the LiDAR range image is still naturally different from the regular 2D RGB image; for example, each position on the range image encodes the unique geometry information. In this paper, we propose a new projection-based LiDAR semantic segmentation pipeline that consists of a novel network structure and an efficient post-processing step. In our network structure, we design a FID (fully interpolation decoding) module that directly upsamples the multi-resolution feature maps using bilinear interpolation. Inspired by the 3D distance interpolation used in PointNet++, we argue this FID module is a 2D version distance interpolation on $(\theta, \phi)$ space. As a parameter-free decoding module, the FID largely reduces the model complexity by maintaining good performance. Besides the network structure, we empirically find that our model predictions have clear boundaries between different semantic classes. This makes us rethink whether the widely used K-nearest-neighbor post-processing is still necessary for our pipeline. Then, we realize the many-to-one mapping causes the blurring effect that some points are mapped into the same pixel and share the same label. Therefore, we propose to process those occluded points by assigning the nearest predicted label to them. This NLA (nearest label assignment) post-processing step shows a better performance than KNN with faster inference speed in the ablation study. On the SemanticKITTI dataset, our pipeline achieves the best performance among all projection-based methods with $64 \times 2048$ resolution and all point-wise solutions. With a ResNet-34 as the backbone, both the training and testing of our model can be finished on a single RTX 2080 Ti with 11G memory. The code is released.

翻译：在 2D 球范围图像上投射点云层时, 将 LiDAR 语义48 断面转换为范围图像上的 2D 分解任务。但是, LiDAR 范围图像仍然自然地不同于常规 2D RGB 图像; 例如, 在范围图像上的每个位置将独特的几何信息编码。在本文中, 我们提议一个新的基于投影的 LIDAR 语义分割管道, 包括一个新的网络结构以及高效的后处理步骤。在我们的网络结构中, 我们设计了一个 FID( 完全内插解码) 模块, 直接用双线内插图打印多分辨率地图。在Point Net+++ 中使用的 3D 远程映射图像中, 我们说这个 FID 模块是一个基于$( teta,\ pphy) 空间的 2D 版本间插图。作为无参数解码模块, FID 通过保持良好的性能, 大大降低了模型的复杂性。除了网络结构外, 我们的模型预测显示多处的直径直径直径直径, 这些直径直径直径直径的直径直径直径直径直径直径直的内径, KK- 将数据解到直径直到直到直到直到直到直到直到直到直到直到直到直到直到直到直到后方的路径段。