We study the problem of inferring an object-centric scene representation from a single image, aiming to derive a representation that explains the image formation process, captures the scene's 3D nature, and is learned without supervision. Most existing methods on scene decomposition lack one or more of these characteristics, due to the fundamental challenge in integrating the complex 3D-to-2D image formation process into powerful inference schemes like deep networks. In this paper, we propose unsupervised discovery of Object Radiance Fields (uORF), integrating recent progresses in neural 3D scene representations and rendering with deep inference networks for unsupervised 3D scene decomposition. Trained on multi-view RGB images without annotations, uORF learns to decompose complex scenes with diverse, textured background from a single image. We show that uORF performs well on unsupervised 3D scene segmentation, novel view synthesis, and scene editing on three datasets.
翻译:我们研究从单一图像中推断一个以物体为中心的场景代表的问题,目的是得出一个能够解释图像形成过程、捕捉场景的3D性质并不经监督而学习的表示方法。由于将复杂的 3D 到-2D 图像形成过程纳入深网络等强大的推论方案面临根本挑战,现场分解的大多数现有方法缺乏这些特征的一个或多个。我们在本文件中提议在不受监督的情况下发现物体辐射场(uORF ), 整合神经3D 场面展示的最新进展, 并结合不受监督的 3D 场景分解的深度推论网络。 在多视图 RGB 图像上培训, 没有说明, uORF 学会用单一图像的不同、 文本背景分解复杂的场景。 我们显示, uORF 在不受监控的 3D 场分解、 新的视图合成和三个数据集的场景编辑方面表现良好。