We show how the inherent, but often neglected, properties of large-scale LiDAR point clouds can be exploited for effective self-supervised representation learning. To this end, we design a highly data-efficient feature pre-training backbone that significantly reduces the amount of tedious 3D annotations to train state-of-the-art object detectors. In particular, we propose a Masked AutoEncoder (MAELi) that intuitively utilizes the sparsity of the LiDAR point clouds in both, the encoder and the decoder, during reconstruction. This results in more expressive and useful features, directly applicable to downstream perception tasks, such as 3D object detection for autonomous driving. In a novel reconstruction scheme, MAELi distinguishes between free and occluded space and leverages a new masking strategy which targets the LiDAR's inherent spherical projection. To demonstrate the potential of MAELi, we pre-train one of the most widespread 3D backbones, in an end-to-end fashion and show the merit of our fully unsupervised pre-trained features on several 3D object detection architectures. Given only a tiny fraction of labeled frames to fine-tune such detectors, we achieve significant performance improvements. For example, with only $\sim800$ labeled frames, MAELi features improve a SECOND model by +10.09APH/LEVEL 2 on Waymo Vehicles.
翻译:我们展示了大 LiDAR 点云的内在但往往被忽视的特性,如何在重建期间将大 LiDAR 点云的内在特性用于有效自我监督的演示学习。 为此,我们设计了一个高数据效率的训练前主干网,大大降低了对最先进的天体探测器的乏味三维说明数量。 特别是,我们提议了一个蒙面自动编码器(MAELI),在重建期间直观地利用大 LiDAR 点云的广度,在3D 点云的编码器和解码器中。这导致更直观和有用的特性,直接适用于下游的认知任务,例如3D 目标自动驾驶探测。 在新的重建计划中, MAELi 区分了自由与隐蔽的空间, 并利用一种新的掩码策略, 以LA 本身固有的球形投射仪为对象。 为了展示MAELi, 我们仅将最广的3D 3D 3D 的骨干网标作为模型之一, 展示了我们未完全超缩的MAE800 前路标的优点。