Diffusion models generating images conditionally on text, such as Dall-E 2 and Stable Diffusion, have recently made a splash far beyond the computer vision community. Here, we tackle the related problem of generating point clouds, both unconditionally, and conditionally with images. For the latter, we introduce a novel geometrically-motivated conditioning scheme based on projecting sparse image features into the point cloud and attaching them to each individual point, at every step in the denoising process. This approach improves geometric consistency and yields greater fidelity than current methods relying on unstructured, global latent codes. Additionally, we show how to apply recent continuous-time diffusion schemes. Our method performs on par or above the state of art on conditional and unconditional experiments on synthetic data, while being faster, lighter, and delivering tractable likelihoods. We show it can also scale to diverse indoors scenes.
翻译:以文字(如Dall-E 2和稳定传播)为条件生成图像的传播模型最近在计算机视觉界之外产生了巨大的飞跃。在这里,我们处理的是产生点云的相关问题,无条件和以图像为条件。对于图像,我们引入了一种新的几何驱动调节机制,其基础是将稀薄图像特性投射到点云中,并在脱去过程中的每一个步骤将其附在每个点上。这种方法提高了几何一致性,并且比目前依靠无结构、全球潜伏代码的方法更加忠诚。此外,我们展示了如何应用最近的连续时间传播计划。我们的方法在对合成数据进行有条件和无条件的实验方面处于同等水平或更高水平上,同时速度更快、更轻巧和可移动的可能性。我们展示它也可以向不同的室内场展示。</s>