Image generation has rapidly evolved in recent years. Modern architectures for adversarial training allow to generate even high resolution images with remarkable quality. At the same time, more and more effort is dedicated towards controlling the content of generated images. In this paper, we take one further step in this direction and propose a conditional generative adversarial network (GAN) that generates images with a defined number of objects from given classes. This entails two fundamental abilities (1) being able to generate high-quality images given a complex constraint and (2) being able to count object instances per class in a given image. Our proposed model modularly extends the successful StyleGAN2 architecture with a count-based conditioning as well as with a regression sub-network to count the number of generated objects per class during training. In experiments on three different datasets, we show that the proposed model learns to generate images according to the given multiple-class count condition even in the presence of complex backgrounds. In particular, we propose a new dataset, CityCount, which is derived from the Cityscapes street scenes dataset, to evaluate our approach in a challenging and practically relevant scenario.
翻译:近些年来, 图像生成迅速演变。 现代的对抗性培训架构允许生成甚至高分辨率且质量惊人的图像。 与此同时, 越来越多的努力致力于控制生成图像的内容。 在本文中, 我们进一步朝这个方向迈出一步, 并提议一个有条件的基因化对抗网络( GAN), 生成来自特定类别的定数对象的图像。 这需要两种基本能力:(1) 能够生成高质量的图像, 并具备复杂的制约条件, (2) 能够在给定图像中计算每类对象实例。 我们提议的模型模块扩展了成功的StyleGAN2 架构, 配有基于计数的调节以及一个回归子网络, 以计算每类生成的物体数量。 在三个不同的数据集上进行实验时, 我们显示, 提议的模型学会根据特定多级的计数条件生成图像, 即使在背景复杂的情况下 。 特别是, 我们提议一个新的数据集, CityCount,, 由城市景象图像数据集衍生, 以评价我们在一个富有挑战且实际相关的情景中的方法 。