In this paper, we tackle the problem of unsupervised 3D object segmentation from a point cloud without RGB information. In particular, we propose a framework, SPAIR3D, to model a point cloud as a spatial mixture model and jointly learn the multiple-object representation and segmentation in 3D via Variational Autoencoders (VAE). Inspired by SPAIR, we adopt an object-specification scheme that describes each object's location relative to its local voxel grid cell rather than the point cloud as a whole. To model the spatial mixture model on point clouds, we derive the Chamfer Likelihood, which fits naturally into the variational training pipeline. We further design a new spatially invariant graph neural network to generate a varying number of 3D points as a decoder within our VAE. Experimental results demonstrate that SPAIR3D is capable of detecting and segmenting variable number of objects without appearance information across diverse scenes.
翻译:在本文中,我们从没有 RGB 信息的点云中解决了未受监督的 3D 对象分割问题。 特别是,我们提议了一个框架, SPAIR3D, 将点云建为空间混合模型, 并共同通过变式自动电解器( VAE) 学习3D 中的多对象表达和分割。 在 SPAIR 的启发下, 我们采用了一个对象区分方案, 描述每个对象相对于其本地 voxel 格格格细胞的位置, 而不是整个点云。 为了在点云上模拟空间混合模型, 我们从点云中推导出自然适合变异训练管道的 Champer 相似点。 我们进一步设计一个新的空间变异图形神经网络, 以产生不同数量的 3D 点作为我们VAE 的解码。 实验结果显示, SPIR3D 能够检测和分解不同场景的变量。