Masked auto-encoding is a popular and effective self-supervised learning approach to point cloud learning. However, most of the existing methods reconstruct only the masked points and overlook the local geometry information, which is also important to understand the point cloud data. In this work, we make the first attempt, to the best of our knowledge, to consider the local geometry information explicitly into the masked auto-encoding, and propose a novel Masked Surfel Prediction (MaskSurf) method. Specifically, given the input point cloud masked at a high ratio, we learn a transformer-based encoder-decoder network to estimate the underlying masked surfels by simultaneously predicting the surfel positions (i.e., points) and per-surfel orientations (i.e., normals). The predictions of points and normals are supervised by the Chamfer Distance and a newly introduced Position-Indexed Normal Distance in a set-to-set manner. Our MaskSurf is validated on six downstream tasks under three fine-tuning strategies. In particular, MaskSurf outperforms its closest competitor, Point-MAE, by 1.2\% on the real-world dataset of ScanObjectNN under the OBJ-BG setting, justifying the advantages of masked surfel prediction over masked point cloud reconstruction. Codes will be available at https://github.com/YBZh/MaskSurf.
翻译:蒙面自动编码是一种受欢迎和有效的自我监督的学习方法,用于指向云层学习。然而,大多数现有方法只重建遮面点,忽略当地几何测量信息,这对于理解点云数据也很重要。在这项工作中,我们根据我们的知识,首先考虑本地几何信息,明确纳入掩面自动编码,并提议一种新型的遮面表面预测(MaskSurf)法。具体地说,鉴于输入点遮面云以高比率遮蔽,我们学习了一个基于变压器的编码-decoder网络,通过同时预测表面位置(即点)和每个表面方向(即正常点)来估计潜在的遮面冲浪数据。对点和正常点的预测由Chamfer距离和新引入的定位-内分解正常距离(MaskSurfer)以固定方式监督。我们的MaskSurferf在三个细调战略下游任务中进行了验证。特别是,MaskSurforsreadS reformain the regest Greal-Oformais degion romabromais of the Scal-Obrview romabs) rofrgiew romagiew rodud rogard rodududududustration rodustration. roduction rodufism romaismation.