We present an innovative two-headed attention layer that combines geometric and latent features to segment a 3D scene into semantically meaningful subsets. Each head combines local and global information, using either the geometric or latent features, of a neighborhood of points and uses this information to learn better local relationships. This Geometric-Latent attention layer (Ge-Latto) is combined with a sub-sampling strategy to capture global features. Our method is invariant to permutation thanks to the use of shared-MLP layers, and it can also be used with point clouds with varying densities because the local attention layer does not depend on the neighbor order. Our proposal is simple yet robust, which allows it to achieve competitive results in the ShapeNetPart and ModelNet40 datasets, and the state-of-the-art when segmenting the complex dataset S3DIS, with 69.2% IoU on Area 5, and 89.7% overall accuracy using K-fold cross-validation on the 6 areas.
翻译:我们展示了一个创新的双向关注层,将几何和潜伏特征结合起来,将三维场景分割成具有地震意义的子集。每个头部将局部和全球信息结合在一起,使用一带点的几何或潜伏特征,并使用这一信息学习更好的本地关系。这个几何Lattent关注层(Ge-Latto)与分抽样战略相结合,以捕捉全球特征。由于使用共享的MLP层,我们的方法不易变异,也可用于不同密度的点云,因为本地的注意层并不取决于邻接顺序。我们的建议简单而有力,使其能够在ShapeNetPart和ModelNet40数据集中取得竞争性结果,在将复杂的数据数据集S3DIS(S3DIS)分解时,将S3DIS分解为69.2%的IoU(IoU)在区域5和89.7%的总精度,使用对6个区域的K-倍交叉校准。