We propose SparseFusion, a sparse view 3D reconstruction approach that unifies recent advances in neural rendering and probabilistic image generation. Existing approaches typically build on neural rendering with re-projected features but fail to generate unseen regions or handle uncertainty under large viewpoint changes. Alternate methods treat this as a (probabilistic) 2D synthesis task, and while they can generate plausible 2D images, they do not infer a consistent underlying 3D. However, we find that this trade-off between 3D consistency and probabilistic image generation does not need to exist. In fact, we show that geometric consistency and generative inference can be complementary in a mode-seeking behavior. By distilling a 3D consistent scene representation from a view-conditioned latent diffusion model, we are able to recover a plausible 3D representation whose renderings are both accurate and realistic. We evaluate our approach across 51 categories in the CO3D dataset and show that it outperforms existing methods, in both distortion and perception metrics, for sparse-view novel view synthesis.
翻译:我们提出“SprasserFusion ”, 这是一种稀疏的三维重建方法,它统一了神经成像和概率图像生成的最新进展。 现有方法通常建立在神经成像上,带有重新预测的特征,但未能产生不可见的区域,或处理大视野变化下的不确定性。 其它方法将此视为一种(概率) 2D 合成任务,虽然它们能够产生可信的 2D 图像,但是它们并不推断出一种前后一致的3D 基本图像。 然而,我们发现3D 一致性和概率图像生成之间的这种权衡并不需要存在。 事实上,我们表明几何一致性和基因化推论在寻求模式的行为中可以相互补充。 通过从以视觉为条件的潜伏扩散模型中提取三维一致的场面代表,我们可以恢复一种合理的3D 3D 表达方式, 其表述既准确又现实。 我们评估了CO3D 数据集中51个类别中我们的方法, 并表明它超越了现有方法, 包括扭曲和感知度指标, 以及稀有的新观点合成方法。