This paper presents a 3D generative model that uses diffusion models to automatically generate 3D digital avatars represented as neural radiance fields. A significant challenge in generating such avatars is that the memory and processing costs in 3D are prohibitive for producing the rich details required for high-quality avatars. To tackle this problem we propose the roll-out diffusion network (Rodin), which represents a neural radiance field as multiple 2D feature maps and rolls out these maps into a single 2D feature plane within which we perform 3D-aware diffusion. The Rodin model brings the much-needed computational efficiency while preserving the integrity of diffusion in 3D by using 3D-aware convolution that attends to projected features in the 2D feature plane according to their original relationship in 3D. We also use latent conditioning to orchestrate the feature generation for global coherence, leading to high-fidelity avatars and enabling their semantic editing based on text prompts. Finally, we use hierarchical synthesis to further enhance details. The 3D avatars generated by our model compare favorably with those produced by existing generative techniques. We can generate highly detailed avatars with realistic hairstyles and facial hair like beards. We also demonstrate 3D avatar generation from image or text as well as text-guided editability.
翻译:本文展示了一个3D基因模型, 使用传播模型自动生成 3D 数字变异体作为神经光亮场。 生成这种变异体的一个重大挑战是, 3D 的记忆和处理成本对于生产高质量变异体所需的丰富细节来说是令人望而却步的。 为了解决这个问题,我们建议推出一个推出的传播网络(Rodin), 它代表一个神经发光场, 作为多重 2D 地貌地图, 并将这些地图放入一个单一的 2D 地貌平面, 我们在该平面上进行 3D 的传播。 Rodin 模型带来了急需的计算效率, 同时保持了3D 的传播的完整性, 使用 3D 的 3D 感知共变, 根据 3D 的原始关系, 关注 2D 地貌平面的预测特性。 我们还使用潜在的调节功能, 将地貌的生成成像 3D