We present a lightweight post-processing method to refine the semantic segmentation results of point cloud sequences. Most existing methods usually segment frame by frame and encounter the inherent ambiguity of the problem: based on a measurement in a single frame, labels are sometimes difficult to predict even for humans. To remedy this problem, we propose to explicitly train a network to refine these results predicted by an existing segmentation method. The network, which we call the P2Net, learns the consistency constraints between coincident points from consecutive frames after registration. We evaluate the proposed post-processing method both qualitatively and quantitatively on the SemanticKITTI dataset that consists of real outdoor scenes. The effectiveness of the proposed method is validated by comparing the results predicted by two representative networks with and without the refinement by the post-processing network. Specifically, qualitative visualization validates the key idea that labels of the points that are difficult to predict can be corrected with P2Net. Quantitatively, overall mIoU is improved from 10.5% to 11.7% for PointNet [1] and from 10.8% to 15.9% for PointNet++ [2].
翻译:我们提出了一个轻量级后处理方法,以完善点云序列的语义分解结果。大多数现有方法通常是按框架分块框架,并遇到问题固有的模糊性:根据一个单一框架中的测量,标签有时甚至难以预测人。为了解决这个问题,我们提议明确培训一个网络,以完善现有分解方法预测的结果。我们称之为P2Net的网络,从登记后的连续框架中了解同步点之间的一致性限制。我们从质量上和数量上评价由真实室外景组成的SmanticKITTI数据集中拟议的后处理方法。通过比较两个有代表性的网络和后处理网络预测的结果,验证了拟议方法的有效性。具体地说,定性可视化验证了这样一个关键想法,即难以预测的点的标签可以用P2Net来校正。从定量上看,整个MIOU在P点Net [1] 上从10.5%提高到11.7%,从10.8%提高到P点Net++ [2] 。