In this paper, we tackle the problem of unsupervised 3D object segmentation from a point cloud without RGB information. In particular, we propose a framework,~{\bf SPAIR3D}, to model a point cloud as a spatial mixture model and jointly learn the multiple-object representation and segmentation in 3D via Variational Autoencoders (VAE). Inspired by SPAIR, we adopt an object-specification scheme that describes each object's location relative to its local voxel grid cell rather than the point cloud as a whole. To model the spatial mixture model on point clouds, we derive the~\emph{Chamfer Likelihood}, which fits naturally into the variational training pipeline. We further design a new spatially invariant graph neural network to generate a varying number of 3D points as a decoder within our VAE.~Experimental results demonstrate that~{\bf SPAIR3D} is capable of detecting and segmenting variable number of objects without appearance information across diverse scenes.
翻译:在本文中,我们从没有 RGB 信息的点云中解决未受监督的 3D 对象分割问题。 特别是, 我们提出一个框架, 以空间混合模型为模型模拟点云, 并共同通过变换自动电解码器( VAE) 学习 3D 中的多对象表达和分隔。 在 SPAIR 的启发下, 我们采用了一个目标区分方案, 描述每个对象相对于其本地 voxel 格格细胞的位置, 而不是整个点云的位置。 为了在点云上建模空间混合模型, 我们开发出自然适合变异训练管道的 emph{ Chamfer Lilihood} 。 我们进一步设计一个新的空间变异式图形神经网络, 以生成不同数量的 3D 点作为我们VAE 的解码 。 ~ 实验结果显示 \bf SPIR3D} 能够检测和分解不同场点的变量物体数量。