Despite the recent progress of generative adversarial networks (GANs) at synthesizing photo-realistic images, producing complex urban scenes remains a challenging problem. Previous works break down scene generation into two consecutive phases: unconditional semantic layout synthesis and image synthesis conditioned on layouts. In this work, we propose to condition layout generation as well for higher semantic control: given a vector of class proportions, we generate layouts with matching composition. To this end, we introduce a conditional framework with novel architecture designs and learning objectives, which effectively accommodates class proportions to guide the scene generation process. The proposed architecture also allows partial layout editing with interesting applications. Thanks to the semantic control, we can produce layouts close to the real distribution, helping enhance the whole scene generation process. On different metrics and urban scene benchmarks, our models outperform existing baselines. Moreover, we demonstrate the merit of our approach for data augmentation: semantic segmenters trained on real layout-image pairs along with additional ones generated by our approach outperform models only trained on real pairs.
翻译:尽管在综合光现实图像时,基因对抗网络(GANs)最近取得了进展,但生成复杂的城市景象仍是一个具有挑战性的问题。先前的工程将现场生成分成两个连续阶段:无条件的语义布局合成和图像合成以布局为条件。在这项工作中,我们提议对布局的生成和更高的语义控制进行条件化:根据一个等级比例的矢量,我们生成配对的布局。为此,我们引入了一个带有新颖建筑设计和学习目标的有条件框架,有效地适应了级比例以指导现场生成过程。拟议的建筑还允许部分布局编辑与有趣的应用程序。由于语义控制,我们可以制作接近真实分布的布局,帮助加强整个场景生成过程。在不同的量和城市景基准上,我们的模型优于现有的基线。此外,我们展示了我们的数据增强方法的优点:对真实布局成型配对进行了培训的语义区段以及我们方法外形模型产生的附加的模型,但仅对真实配对进行了培训。