In many applications of computer graphics, art and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph or layout, and have a computer system automatically generate photo-realistic images that adhere to the input content. While classic works that allow such automatic image content generation have followed a framework of image retrieval and composition, recent advances in deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation tasks. This paper reviews recent works for image synthesis given intuitive user input, covering advances in input versatility, image generation methodology, benchmark datasets, and evaluation metrics. This motivates new perspectives on input representation and interactivity, cross pollination between major image generation paradigms, and evaluation and comparison of generation methods.
翻译:在许多计算机图形、艺术和设计应用中,用户最好能够提供直观的非图像输入,如文字、草图、中风、图表或布局,并有一个计算机系统自动生成符合输入内容的摄影现实图像。允许自动生成图像内容的经典作品遵循了图像检索和构成框架,而基因对抗网络(GANs)、变异自动转换器(VAEs)和流动法等深层基因化模型的最近进展使得更强大和多功能的图像生成任务得以实现。本文回顾了提供直观用户输入的图像合成最新作品,涵盖了投入多功能、图像生成方法、基准数据集和评价指标方面的进展。这激发了对投入代表性和互动性、主要图像生成模式之间的交叉授粉以及生成方法的评价和比较的新观点。