Unsupervised contrastive learning for indoor-scene point clouds has achieved great successes. However, unsupervised learning point clouds in outdoor scenes remains challenging because previous methods need to reconstruct the whole scene and capture partial views for the contrastive objective. This is infeasible in outdoor scenes with moving objects, obstacles, and sensors. In this paper, we propose CO^3, namely Cooperative Contrastive Learning and Contextual Shape Prediction, to learn 3D representation for outdoor-scene point clouds in an unsupervised manner. CO^3 has several merits compared to existing methods. (1) It utilizes LiDAR point clouds from vehicle-side and infrastructure-side to build views that differ enough but meanwhile maintain common semantic information for contrastive learning, which are more appropriate than views built by previous methods. (2) Alongside the contrastive objective, shape context prediction is proposed as pre-training goal and brings more task-relevant information for unsupervised 3D point cloud representation learning, which are beneficial when transferring the learned representation to downstream detection tasks. (3) As compared to previous methods, representation learned by CO^3 is able to be transferred to different outdoor scene dataset collected by different type of LiDAR sensors. (4) CO^3 improves current state-of-the-art methods on both Once and KITTI datasets by up to 2.58 mAP. Codes and models will be released. We believe CO^3 will facilitate understanding LiDAR point clouds in outdoor scene.
翻译:然而,室外场景中未经监督的学习点云层仍具有挑战性,因为以前的方法需要重建整个场景,并为对比性目标收集部分观点。这在室外场景中与移动物体、障碍物和传感器不可行。在本文中,我们提议CO3,即合作反向学习和背景形状预测,以不受监督的方式学习户外点云层的3D代表制。CO3与现有方法相比有若干优点。 (1) 利用汽车侧和基础设施侧面的LIDAR点云层建立观点,这些观点虽然足够不同,但同时保持共同的语义信息用于对比性学习,这比以往方法建立的观点更合适。 (2) 除这一对比性目标外,我们提议将背景预测作为培训前目标,并带来更多任务相关信息,用于不超越轨的3D点云层代表制学习。 当将所学过的云层代表制转换为下游探测任务时,与以往的方法相比,CO3-3级点所学到的LI-D代表制式代表制式能够通过不同的方式向不同的室外数据转换成不同的传感器。