Advances in 3D-aware generative models have pushed the boundary of image synthesis with explicit camera control. To achieve high-resolution image synthesis, several attempts have been made to design efficient generators, such as hybrid architectures with both 3D and 2D components. However, such a design compromises multiview consistency, and the design of a pure 3D generator with high resolution is still an open problem. In this work, we present Generative Volumetric Primitives (GVP), the first pure 3D generative model that can sample and render 512-resolution images in real-time. GVP jointly models a number of volumetric primitives and their spatial information, both of which can be efficiently generated via a 2D convolutional network. The mixture of these primitives naturally captures the sparsity and correspondence in the 3D volume. The training of such a generator with a high degree of freedom is made possible through a knowledge distillation technique. Experiments on several datasets demonstrate superior efficiency and 3D consistency of GVP over the state-of-the-art.
翻译:三维感知生成模型的发展推动了具有显式相机控制的图像合成的界限。为了实现高分辨率图像合成,已经尝试了设计高效的生成器,例如,具有3D和2D组件的混合体系结构。但是,这种设计会损害多视角一致性,而纯3D生成器的设计具有高分辨率仍然是一个未解决的问题。在这项工作中,我们提出了生成体素基元(GVP),这是第一个能够实时采样和渲染512分辨率图像的纯3D生成模型。 GVP联合建模了许多体积基元及其空间信息,两者都可以通过二维卷积网络高效生成。这些基元的混合自然地捕捉了3D体积中的稀疏性和对应关系。这种具有高自由度的生成器的训练是通过知识蒸馏技术实现的。多个数据集上的实验表明,与最先进的技术相比,GVP具有卓越的效率和3D一致性。