We present ShapeCrafter, a neural network for recursive text-conditioned 3D shape generation. Existing methods to generate text-conditioned 3D shapes consume an entire text prompt to generate a 3D shape in a single step. However, humans tend to describe shapes recursively-we may start with an initial description and progressively add details based on intermediate results. To capture this recursive process, we introduce a method to generate a 3D shape distribution, conditioned on an initial phrase, that gradually evolves as more phrases are added. Since existing datasets are insufficient for training this approach, we present Text2Shape++, a large dataset of 369K shape-text pairs that supports recursive shape generation. To capture local details that are often used to refine shape descriptions, we build on top of vector-quantized deep implicit functions that generate a distribution of high-quality shapes. Results show that our method can generate shapes consistent with text descriptions, and shapes evolve gradually as more phrases are added. Our method supports shape editing, extrapolation, and can enable new applications in human-machine collaboration for creative design.
翻译:我们提出了 ShapeCrafter,这是一种用于递归文本条件下生成 3D 形状的神经网络。现有的生成文本条件下的 3D 形状的方法会消耗整个文本提示来在单一步骤中生成形状。然而,人类倾向于递归地描述形状,我们可能以一个初始描述开始,并根据中间结果逐步添加细节。为了捕捉这个递归过程,我们引入了一种方法,生成一个在初始短语的条件下逐渐演变的 3D 形状分布,并且随着更多的短语的添加而进一步演变。由于现有数据集不足以训练这种方法,我们推出了 Text2Shape++,这是一个包含 369K 个形状-文本对的大型数据集,用于支持递归形状生成。为了捕捉通常用于细化形状描述的局部细节,我们在向量量化的深度隐式函数之上构建,这些函数生成高质量的形状分布。实验结果表明,我们的方法可以生成与文本描述一致的形状,而形状随着添加更多的短语逐渐演变。我们的方法支持形状编辑,外推,并可以实现人机协作创意设计的新应用。