In this paper, we learn a diffusion model to generate 3D data on a scene-scale. Specifically, our model crafts a 3D scene consisting of multiple objects, while recent diffusion research has focused on a single object. To realize our goal, we represent a scene with discrete class labels, i.e., categorical distribution, to assign multiple objects into semantic categories. Thus, we extend discrete diffusion models to learn scene-scale categorical distributions. In addition, we validate that a latent diffusion model can reduce computation costs for training and deploying. To the best of our knowledge, our work is the first to apply discrete and latent diffusion for 3D categorical data on a scene-scale. We further propose to perform semantic scene completion (SSC) by learning a conditional distribution using our diffusion model, where the condition is a partial observation in a sparse point cloud. In experiments, we empirically show that our diffusion models not only generate reasonable scenes, but also perform the scene completion task better than a discriminative model. Our code and models are available at https://github.com/zoomin-lee/scene-scale-diffusion
翻译:在本文中, 我们学习了一个用于生成 3D 的场景模型, 以在场景尺度上生成 3D 数据 。 具体地说, 我们的模型设计了一个由多个对象组成的 3D 场景, 而最近的扩散研究则侧重于一个单一对象 。 为了实现我们的目标, 我们代表了一个带有离散类标签的场景, 即绝对分布, 将多个对象划入语义类别。 因此, 我们扩展了一个离散的场景模型, 以学习场景尺度的绝对分布 。 此外, 我们验证了一个潜伏的传播模型可以降低培训和部署的计算成本 。 根据我们的最佳知识, 我们的工作是第一个应用离散和潜伏的场景尺度上 3D 绝对数据 。 我们进一步提议通过使用我们的扩散模型来学习有条件的场景分布( SSC ), 即条件是在稀薄的云层中进行部分观测 。 我们在实验中显示, 我们的传播模型不仅 产生合理的场景尺度, 而且比歧视模型更能完成场景完成任务 。 我们的代码和模型可以在 https://gith.com/ zominmin- lee- sale- sale- sal- difal- difevulate- dive- dive- devulvelvelve