Weakly supervised point cloud semantic segmentation methods that require 1\% or fewer labels, hoping to realize almost the same performance as fully supervised approaches, which recently, have attracted extensive research attention. A typical solution in this framework is to use self-training or pseudo labeling to mine the supervision from the point cloud itself, but ignore the critical information from images. In fact, cameras widely exist in LiDAR scenarios and this complementary information seems to be greatly important for 3D applications. In this paper, we propose a novel cross-modality weakly supervised method for 3D segmentation, incorporating complementary information from unlabeled images. Basically, we design a dual-branch network equipped with an active labeling strategy, to maximize the power of tiny parts of labels and directly realize 2D-to-3D knowledge transfer. Afterwards, we establish a cross-modal self-training framework in an Expectation-Maximum (EM) perspective, which iterates between pseudo labels estimation and parameters updating. In the M-Step, we propose a cross-modal association learning to mine complementary supervision from images by reinforcing the cycle-consistency between 3D points and 2D superpixels. In the E-step, a pseudo label self-rectification mechanism is derived to filter noise labels thus providing more accurate labels for the networks to get fully trained. The extensive experimental results demonstrate that our method even outperforms the state-of-the-art fully supervised competitors with less than 1\% actively selected annotations.
翻译:微弱监督的云层语义分解方法需要1 ⁇ 或更少的标签,希望实现与完全监督的方法几乎相同的性能,这些方法最近引起了广泛的研究关注。这个框架中的一个典型解决办法是使用自训练或假标签,从点云本身将监看地雷,但忽视图像中的关键信息。事实上,利达AR情景中广泛存在照相机,而这种补充信息对于3D应用来说似乎非常重要。在本文中,我们提出了一个新的三维分解交叉模式薄弱监督方法,其中包括来自未贴标签图像的补充信息。基本上,我们设计了一个配备积极标签战略的双管网络,以最大限度地发挥标签小部分的力量,并直接实现2D至3D知识的转移。随后,我们建立了一个跨模式的自我培训框架,在伪标签估计和参数更新之间进行充分交流。在M-Step中,我们建议一种跨模式的关联,从甚至从一个不贴标签的图像的互补监督,通过强化循环-稳定的标签系统,在1D级级中提供更精确的自我定位的标签,在2点和更精确的标签的标签中,在1D级的标签中提供更精确的自我解释的自我定位的自我定位的自我定位机制。