Scene understanding is a major challenge of today's computer vision. Center to this task is image segmentation, since scenes are often provided as a set of pictures. Nowadays, many such datasets also provide 3D geometry information given as a 3D point cloud acquired by a laser scanner or a depth camera. To exploit this geometric information, many current approaches rely on both a 2D loss and 3D loss, requiring not only 2D per pixel labels but also 3D per point labels. However obtaining a 3D groundtruth is challenging, time-consuming and error-prone. In this paper, we show that image segmentation can benefit from 3D geometric information without requiring any 3D groundtruth, by training the geometric feature extraction with a 2D segmentation loss in an end-to-end fashion. Our method starts by extracting a map of 3D features directly from the point cloud by using a lightweight and simple 3D encoder neural network. The 3D feature map is then used as an additional input to a classical image segmentation network. During training, the 3D features extraction is optimized for the segmentation task by back-propagation through the entire pipeline. Our method exhibits state-of-the-art performance with much lighter input dataset requirements, since no 3D groundtruth is required.
翻译:场景理解是当今计算机视觉的一大挑战。 此任务的中心是图像分割, 因为场景通常以一组图片的形式提供。 如今, 许多这样的数据集也提供 3D 几何信息, 以激光扫描仪或深度相机获得的 3D 点云形式提供 3D 几何信息 。 要利用这种几何信息, 许多当前方法不仅依赖于 2D 损失 和 3D 损失, 不仅需要每像素标签2D, 还需要每点标签 3D 3D 。 然而, 获得 3D 地面真相是具有挑战性、 耗时和 易出错的。 在本文中, 我们显示 3D 地谱分割可以受益于 3D 几何信息, 而不需要任何 3D 地图信息 。 通过培训, 以端到 端的方式, 训练以 2D 的分解损失 3D 方法, 我们的方法从点云直接提取 3D 特征的地图, 需要使用轻重的 3D 神经网络。 然后将3D 地图用作一个典型图像分割网络的附加 。 在训练中, 三D 最轻的提取方法, 最优化的 进行 的 的 的 平流路路段 。