Existing approaches for unsupervised point cloud pre-training are constrained to either scene-level or point/voxel-level instance discrimination. Scene-level methods tend to lose local details that are crucial for recognizing the road objects, while point/voxel-level methods inherently suffer from limited receptive field that is incapable of perceiving large objects or context environments. Considering region-level representations are more suitable for 3D object detection, we devise a new unsupervised point cloud pre-training framework, called ProposalContrast, that learns robust 3D representations by contrasting region proposals. Specifically, with an exhaustive set of region proposals sampled from each point cloud, geometric point relations within each proposal are modeled for creating expressive proposal representations. To better accommodate 3D detection properties, ProposalContrast optimizes with both inter-cluster and inter-proposal separation, i.e., sharpening the discriminativeness of proposal representations across semantic classes and object instances. The generalizability and transferability of ProposalContrast are verified on various 3D detectors (i.e., PV-RCNN, CenterPoint, PointPillars and PointRCNN) and datasets (i.e., KITTI, Waymo and ONCE).
翻译:对未受监督的云层进行预培训的现有方法限于现场一级或点/voxel级实例歧视; 场级方法往往会失去对承认道路物体至关重要的局部细节,而点/voxel级方法本身会受到无法发现大型物体或环境环境的有限可接受场的损害; 考虑到区域一级的表示更适合3D物体探测, 我们设计一个新的未经监督的云层预培训框架,称为“建议 Contrast”,通过对区域提案进行比较,了解3D的强势表现。 具体地说,在对每一点云进行抽样的一套详尽的区域建议中,每个建议中的几何点关系都建模,以建立表达式的表示式。 为了更好地容纳3D探测特性,建议Contrast优化了集群间和提议间分离,即加强各语系类别和对象实例之间提案表述的歧视性。 各种3D探测器(即PV-RCNNNN、CentG、PointPillars和PointR)和数据系统(即K-NCSet、K-NCSet、K-IC)和Pent-IS)核查了提案的可概括性和可转让性。