Image instance segmentation is a fundamental research topic in autonomous driving, which is crucial for scene understanding and road safety. Advanced learning-based approaches often rely on the costly 2D mask annotations for training. In this paper, we present a more artful framework, LiDAR-guided Weakly Supervised Instance Segmentation (LWSIS), which leverages the off-the-shelf 3D data, i.e., Point Cloud, together with the 3D boxes, as natural weak supervisions for training the 2D image instance segmentation models. Our LWSIS not only exploits the complementary information in multimodal data during training, but also significantly reduces the annotation cost of the dense 2D masks. In detail, LWSIS consists of two crucial modules, Point Label Assignment (PLA) and Graph-based Consistency Regularization (GCR). The former module aims to automatically assign the 3D point cloud as 2D point-wise labels, while the latter further refines the predictions by enforcing geometry and appearance consistency of the multimodal data. Moreover, we conduct a secondary instance segmentation annotation on the nuScenes, named nuInsSeg, to encourage further research on multimodal perception tasks. Extensive experiments on the nuInsSeg, as well as the large-scale Waymo, show that LWSIS can substantially improve existing weakly supervised segmentation models by only involving 3D data during training. Additionally, LWSIS can also be incorporated into 3D object detectors like PointPainting to boost the 3D detection performance for free. The code and dataset are available at https://github.com/Serenos/LWSIS.
翻译:自动驱动中的基本研究课题是图像图像断层,这是对现场理解和道路安全至关重要的自主驱动中的一个基本研究课题。基于高级学习的方法往往依赖昂贵的 2D 掩码说明来进行培训。在本文中,我们提出了一个更艺术的框架,即LiDAR 引导的Weakly 监督事件截面(LWISIS),利用现成的 3D 数据,即点云和3D 框,作为培训 2D 图像分解模型的自然薄弱监督器。我们的LWIS不仅在培训期间利用多式联运数据中的补充信息,而且还大大降低了密度 2D 掩码的批注成本。 详细来说,LWIS由两个关键模块组成,即点 Label 任务(PLA)和基于图表的 Consisticulation 校正化(GCR) 。 以前的模块旨在将3D点云自动指定为 2D 点标签,而后者只能通过实施软度的地理测量和显示多式数据的一致性来进一步完善预测。此外,我们还在培训中进行二例解解解解点的二级分解,在3DIS路路段上, 也鼓励大规模实验,在数据中进行大规模分析。