We present a method for creating 3D indoor scenes with a generative model learned from a collection of semantic-segmented depth images captured from different unknown scenes. Given a room with a specified size, our method automatically generates 3D objects in a room from a randomly sampled latent code. Different from existing methods that represent an indoor scene with the type, location, and other properties of objects in the room and learn the scene layout from a collection of complete 3D indoor scenes, our method models each indoor scene as a 3D semantic scene volume and learns a volumetric generative adversarial network (GAN) from a collection of 2.5D partial observations of 3D scenes. To this end, we apply a differentiable projection layer to project the generated 3D semantic scene volumes into semantic-segmented depth images and design a new multiple-view discriminator for learning the complete 3D scene volume from 2.5D semantic-segmented depth images. Compared to existing methods, our method not only efficiently reduces the workload of modeling and acquiring 3D scenes for training, but also produces better object shapes and their detailed layouts in the scene. We evaluate our method with different indoor scene datasets and demonstrate the advantages of our method. We also extend our method for generating 3D indoor scenes from semantic-segmented depth images inferred from RGB images of real scenes.
翻译:我们展示了一种方法来创建 3D 室内场景的方法, 这是一种从从不同未知场景中采集的语义分解深度图像集中学习的基因模型。 鉴于一个有特定尺寸的房间, 我们的方法会自动在随机抽样的潜在代码中生成 3D 对象。 不同于现有方法, 它代表室内场景及其类型、 位置和室内物体的其他特性, 并且从完整的 3D 室内场景集中学习场景布局, 我们的方法模型每个是 3D 语义区图像集中学习的 3D 样图像集中学习的 3D 。 我们的方法不仅有效地减少了模拟和获取 3D 场景部分观测的量 。 对于这个目的, 我们用一个不同的投影层投影层投放一个不同的投影层 。 我们从 3D 图像集中以更精确的方式展示了我们内部场景的图像集, 并且用更精确的图像集法 。