We demonstrate how the often overlooked inherent properties of large-scale LiDAR point clouds can be effectively utilized for self-supervised representation learning. In pursuit of this goal, we design a highly data-efficient feature pre-training backbone that considerably reduces the need for tedious 3D annotations to train state-of-the-art object detectors. We propose Masked AutoEncoder for LiDAR point clouds (MAELi) that intuitively leverages the sparsity of LiDAR point clouds in both the encoder and decoder during reconstruction. Our approach results in more expressive and useful features, which can be directly applied to downstream perception tasks, such as 3D object detection for autonomous driving. In a novel reconstruction schema, MAELi distinguishes between free and occluded space and employs a new masking strategy that targets the LiDAR's inherent spherical projection. To demonstrate the potential of MAELi, we pre-train one of the most widely-used 3D backbones in an end-to-end manner and show the effectiveness of our unsupervised pre-trained features on various 3D object detection architectures. Our method achieves significant performance improvements when only a small fraction of labeled frames is available for fine-tuning object detectors. For instance, with ~800 labeled frames, MAELi features enhance a SECOND model by +10.79APH/LEVEL 2 on Waymo Vehicles.
翻译:为了实现这一目标,我们设计了一个高数据效率特点的训练前主干网,以大大减少对枯燥的三维说明的需求,以培训最先进的物体探测器。我们提议为大丽达雷达点云(MAELi)使用蒙面自动编码器,直觉地利用大丽达雷达点云在编码器和解码器中的宽度。我们的方法可以产生更直观和有用的功能,这些功能可以直接应用于下游的感知功能,例如用于自动驾驶的三维物体探测。在一个新型的重建系统中,MAELi区分了自由空间和隐蔽空间,并采用了一个新的掩码战略,以利达雷达点云(MAELI)固有的球状投影为对象。为了展示MAELi的潜力,我们预先以端到端的方式将最广泛使用的三维的骨干之一放在模型上,并展示了我们未受监控的二百分路标的亚前目标的效能,在各种三维标的升级之前,我们用微的升级的升级的标标尺框架上,我们只能用微的升级的标尺的升级的升级的标标尺结构图。</s>