Despite the recent advancement of Generative Adversarial Networks (GANs) in learning 3D-aware image synthesis from 2D data, existing methods fail to model indoor scenes due to the large diversity of room layouts and the objects inside. We argue that indoor scenes do not have a shared intrinsic structure, and hence only using 2D images cannot adequately guide the model with the 3D geometry. In this work, we fill in this gap by introducing depth as a 3D prior. Compared with other 3D data formats, depth better fits the convolution-based generation mechanism and is more easily accessible in practice. Specifically, we propose a dual-path generator, where one path is responsible for depth generation, whose intermediate features are injected into the other path as the condition for appearance rendering. Such a design eases the 3D-aware synthesis with explicit geometry information. Meanwhile, we introduce a switchable discriminator both to differentiate real v.s. fake domains and to predict the depth from a given input. In this way, the discriminator can take the spatial arrangement into account and advise the generator to learn an appropriate depth condition. Extensive experimental results suggest that our approach is capable of synthesizing indoor scenes with impressively good quality and 3D consistency, significantly outperforming state-of-the-art alternatives.
翻译:尽管Generation Adversarial Network(GANs)最近从 2D 数据中学习了 3D-awa 图像合成,尽管GANs (GANs) 最近在学习 2D 数据中学习了 3D-awa 图像合成,但现有方法未能模拟室内场景,因为室内布局和内部天体的多样性很大。 我们争辩说,室内场景没有共同的内在结构,因此仅使用 2D 图像无法用 3D 几何来充分指导模型。 在这项工作中,我们通过引入 3D 之前的深度来填补这一空白。 与其他 3D 数据格式相比, 深度更适合以 3D 为基础的生成机制, 并且在实践中更容易获得。 具体地说, 我们提议了一种双向式发电机, 其中一条路径负责深度生成, 其中间特性被注入到另一条路径, 作为外观展示的条件。 这样的设计可以让 3D-aware合成3 的合成软件既能区分真实性, 也能够预测某种输入的深度。 这样, 歧视者可以考虑空间安排, 并且建议发电机学习一个适当的深度条件。