We introduce DiffRF, a novel approach for 3D radiance field synthesis based on denoising diffusion probabilistic models. While existing diffusion-based methods operate on images, latent codes, or point cloud data, we are the first to directly generate volumetric radiance fields. To this end, we propose a 3D denoising model which directly operates on an explicit voxel grid representation. However, as radiance fields generated from a set of posed images can be ambiguous and contain artifacts, obtaining ground truth radiance field samples is non-trivial. We address this challenge by pairing the denoising formulation with a rendering loss, enabling our model to learn a deviated prior that favours good image quality instead of trying to replicate fitting errors like floating artifacts. In contrast to 2D-diffusion models, our model learns multi-view consistent priors, enabling free-view synthesis and accurate shape generation. Compared to 3D GANs, our diffusion-based approach naturally enables conditional generation such as masked completion or single-view 3D synthesis at inference time.
翻译:我们引入了DiffRF, 3D光亮场合成的新方法, 其基础是分解扩散概率模型。 虽然基于扩散的现有方法在图像、 潜伏代码或点云数据上运行, 但我们是第一个直接生成体积弧度字段的。 为此, 我们提议了一个 3D 脱色模型, 直接在明确的 voxel 网格代表面上运行。 但是, 由一组图像生成的光亮场可以是模糊的, 包含人工制品, 获取地面光亮场样本是非三重的。 我们通过将脱色配对成一个显著丢失的配对来应对这一挑战, 使得我们的模型能够学习一种偏差的图像质量, 而不是试图复制像漂浮的工艺品那样的相容错误。 与 2D 浸泡模型相比, 我们的模型学习多视角, 能够进行自由视图合成和准确的形状生成。 与 3D GANs 相比, 我们基于扩散的方法自然使得有条件的生成, 比如蒙式完成或单视图 3D 合成在推断时间 。